Log Parser Studio Query - IP address by most recently accessed? - parsing

I'm trying to look at the IIS logs with Log Parser Studio to determine how recently some IP addresses have accessed a website. I've used the following query below to at least get myself a count over the life of the logs:
select c-ip, count(c-ip) as requestcount from '[LogFilePath]' group by c-ip order by count(c-ip) desc
I'm having trouble modifying this to pull IP address information with a 'last accessed' date. Is something like this possible? or is there a better way to go about achieving what I want?
Ideally I'd like to utilize this query to audit the logs and after X amount of days of inactivity from an IP address....revoke access based on IP address (remove access at the firewall, IP addresses are on a whitelist).
Due to the nature of the website/application, there may be times where an IP doesn't access for 90-120 days so using a simple 'hit count' doesn't work. It could be easy to mistakenly remove access for a still valid IP address and the hit count is reset when the firewall is rebooted.
Thanks in advance.

Add in MAX(DATE) to the query.
select c-ip, count(c-ip), MAX(DATE) as requestcount from '[LogFilePath]' group by c-ip order by MAX(DATE) desc

Related

How do I count the number of occurances of a value in Influx? (GROUP BY?)

I have a bunch of data in InfluxDB about "articles", and each article has a domain value associated with it.
I was hoping to display the number of articles, by domain. I might be wrong, but I feel like what I want in SQL is this:
select domain, count(*) from articles group by domain;
However, this gives me this error:
ERR: mixing aggregate and non-aggregate queries is not supported
What am I doing wrong?
I suspect that your domain is a field and not a tag. You can only group by a tag (and time intervals).
If your domain was a tag there would be no problem with the query. Although selecting the tag could be redundant since it is already included in the results as the group identifier.
I think Jan Garaj's comment is somewhat inaccurate.
I don't know why you don't get an error that you can't group by domain if it is in fact a field. But perhaps the query parser is just first complaining about you having a field that is in fact not being aggregated in any way when it should be.
EDIT: but to address your question, the way you approach it is the right way.
You need to use correct InfluxDB syntax for GROUP BY (tag is not included in the SELECT clause):
select count(*) from articles group by domain;

Group by in Influx DB like MySql

I am very new to Influx DB and curious, how does Group By Works. For e.g. how can I execute following MySql Query in InfluxDb:
select mean(cputime), vm from CPU group by vm;
First of all, please remember: this is NOT a relational database, the InfluxQL is NOT SQL (even though looking so familiar).
In particular, here:
1) You won't be to get aggregate and non-aggregate values in the meantime (whatever that "non" is, field or tag). Yes, even with grouping.
2) Effectively, you can group only by tags (+ special kind of grouping by time intervals)
So, considering "vm" is a tag in your query - it is not legit.
While that
select mean(cputime) from CPU group by vm
is, but I'd rather strongly discourage you and anyone of not having time restrictions on their queries: aside of being quite meaningless, as the timeseries get grown, it's gonna slow everything down dramatically.
So something like this:
select mean(cputime) from CPU where time > now() - 15m group by vm
or even this:
select mean(cputime) from CPU where time > now() - 90m group by time(15m), vm
gonna be way better.

How to avoid ResourcesExceeded in BigQuery with JOINs

I have the following query
SELECT domain, path, name, title_part, name_part FROM dataset.match_making_name_parts
LEFT JOIN dataset.raw_title_parts ON LOWER(title_part) = LOWER(name_part)
The problem I am trying to solve, is match making between a list of names (of assumed owners of specific web-sites) and names mentioned within the tags and other such places at the raw HTML of a web-site in a given domain and path, which we have crawled.
I want to build a map of partial matches, so I can rank the matches according to some heuristics (how many name parts of the full name got matched, in how many uris and were there matches with other domains).
Example rows:
//raw_title_parts (domain, path, title_part):
cloud.google.com, bigquery/query-plan-explanation, Query
cloud.google.com, bigquery/query-plan-explanation, Plan
cloud.google.com, bigquery/query-plan-explanation, Google
cloud.google.com, bigquery/query-plan-explanation, Cloud
cloud.google.com, bigquery/query-plan-explanation, Platform
//match_making_name_parts (name, name_part):
Google Cloud Platform, Google
Google Cloud Platform, Cloud
Google Cloud Platform, Platform
I assumed, that using JOIN would be easier for the Query Planner, but turns out, it attempts to run the query with insane amount of Stages; the first two and last two perform other task, but most of the stages are just doing repartitions:
READ $2, $1, $20
FROM __PSRC___PA1_2
WRITE $2, $1, $20
TO __PA1_REPARTITION2
BY HASH($20)
I have some ideas about alternative solutions to my problem, but this Resources Exceeded error comes around quite often and I would like to understand it more deeply.
What would be the best way to solve my problem in BigQuery? How can I better predict, which operations are problematic to the Query Planner of BigQuery and how to better use paradigms, that can avoid the problems?
EDIT: It seems that the problems start escalating relatively late on. With LIMIT 10000000, the query runs in 30 seconds, with LIMIT 1000000000 it no longer runs at all. Considering the numbers, this might be an issue with "memory pagination", that happens behind the scenes (does the set fit in RAM)?

Stored procedure history and what application is calling it?

I was given the batch work to research our 200 stored procedures and find out a bunch of different information about them. Is there anyway in SQL Server 2012 to pull execution history on stored procedures? Also is there anyway to tell what application might be calling the stored procedure? Even an IP address would be helpful because we have several server that do various processing.
Any information you can provide me about this would be extremely helpful. I am relatively new to this type of thing in SQL. Thanks!
Is there anyway in SQL Server 2012 to pull execution history on stored procedures?
You can use sys.dm_exec_procedure_stats to find stored procedure execution times plus most time consuming, CPU intensive ones as well
SELECT TOP 10
d.object_id, d.database_id,
OBJECT_NAME(object_id, database_id) 'proc name',
d.cached_time, d.last_execution_time, d.total_elapsed_time,
d.total_elapsed_time/d.execution_count AS [avg_elapsed_time],
d.last_elapsed_time, d.execution_count
FROM
sys.dm_exec_procedure_stats AS d
ORDER BY
[total_worker_time] DESC;
Also is there anyway to tell what application might be calling the stored procedure? Even an IP address would be helpful because we have several server that do various processing.
The answer to both the above questions is NO, unless you monitor them real time using below query. You can run below query using SQL Server Agent as per your predefined intervals and capture the output in a table. Further please note that this gives you individual statements inside a stored procedure.
select
r.session_id,
s.login_name,
c.client_net_address,
s.host_name,
s.program_name,
st.text
from
sys.dm_exec_requests r
inner join
sys.dm_exec_sessions s on r.session_id = s.session_id
left join
sys.dm_exec_connections c on r.session_id = c.session_id
outer apply
sys.dm_exec_sql_text(r.sql_handle) st

Mysql-proxy and Lua how to transmit query to different server for sharding table?

I split all table to several db server .such as : table1/2/...10 in serverA, table11/12... in serverB.
I want to achieve the goal:
a sql query: select * from table1 ;
use lua to transmit this query to serverA . if query table is in B, transmit it to B
I research rw-splitting.lua in proxy doc , it's only change proxy.connection.backend_ndx .but I test to change it in read_query(), but can't work.
To my knowledge proxy doesn't give you that functionality. There are other commercial products that perform this query routing according to a sharding policy and not only that they also can run queries on all databases and combine results, reshard the data for addr or removed dbs online, monitoring and management of the system, and much more. I recommend you look at Scalebase (disclaimer I work there) at www.scalebase.com.

Resources