PRAW: inconsistent behavior when trying to monitor ups/downs - reddit

I am trying to monitor a post, and plot the number of ups and downs over a 24 hour period (at 5 minute intervals). The core of the code looks like this:
while True:
post = r.get_submission(submission_id='23a1zz')
time.sleep(5)
post.refresh()
print post.ups
time.sleep(5*60)
However, it does not reflect the true ups and downs. It's stuck at the same number even though the actual post is pretty dynamic.

The API Guidelines state that the same resource shouldn't be requested more often than every 30 seconds. This guideline is backed by a cache, on both Reddit and PRAW's end that will return the same content if requested again within a short while. http://praw.readthedocs.org/en/latest/pages/faq.html#i-made-a-change-but-it-doesn-t-seem-to-have-an-effect

Related

Getting Locust to send a predefined distribution of requests per second

I previously asked this question about using Locust as the means of delivering a static, repeatable request load to the target server (n requests per second for five minutes, where n is predetermined for each second), and it was determined that it's not readily achievable.
So, I took a step back and reformulated the problem into something that you probably could do using a custom load shape, but I'm not sure how – hence this question.
As in the previous question, we have a 5-minute period of extracted Apache logs, where each second, anywhere from 1 to 36 GET requests were made to an Apache server. From those logs, I can get a distribution of how many times a certain requests-per-second rate appeared; e.g. there's a 1/4000 chance of 36 requests being processed on any given second, 1/50 for 18 requests to be processed on any given second, etc.
I can model the distribution of request rates as a simple Python list: the numbers between 1 and 36 appear in it an equal number of times as 1–36 requests per second were made in the 5-minute period captured in the Apache logs, and then just randomly get a number from it in the tick() method of a custom load shape to get a number that informs the (user count, spawn rate) calculation.
Additionally, by using a predetermined random seed, I can make the test runs repeatable to within an acceptable level of variation to be useful in testing my API server configuration changes, since the same random list elements should be retrieved each time.
The problem is that I'm not yet able to "think in Locust", to think in terms of user counts and spawn rates instead of rates of requests received by the server.
The question becomes this:
How do you implement the tick() method of a custom load shape in such a way that the (user count, spawn rate) tuple results in a roughly known distribution of requests per second to be sent, possibly with the help of other configuration options and plugins?
You need to create a Locust User with the tasks you want it to run (e.g. make your http calls). You can define time between tasks to kind of control the requests per second. If you have a task to make a single http call and define wait_time = constant(1) you can roughly get 1 request per second. Locust's spawn_rate is a per second unit. Since you have the data you want to reproduce already and it's in 1 second intervals, you can then create a LoadTestShape class with the tick() method somewhat like this:
class MyShape(LoadTestShape):
repro_data = […]
last_user_count = 0
def tick(self):
self.last_user_count = requests_per_second
if len(self.repro_data) > 0:
requests_per_second = self.repro_data.pop(0)
requests_per_second_diff = abs(last_user_count - requests_per_second)
return (requests_per_second, requests_per_second_diff)
return None
If your first data point is 10 requests, you'd need requests_per_second=10 and requests_per_second_diff=10 to make Locust spin up all 10 users in a single second. If the next second is 25, you'd have requests_per_second=25 and requests_per_second_diff=15. In a Load Shape, spawn_rate also works for decreasing the number of users. So if next is 16, requests_per_second=16 and requests_per_second_diff=9.

Grafana: Panel with time of last result

I have an elasticsearch instance that receives logs from multiple backup routines. I'd like to query ES for these logs from Grafana and set up a panel that shows the last time for the different backups. Ideally I would also like to be able to show this in color if the time is longer than a certain threshold.
Basically the idea is to have a display that shows, for instance, green if a certain backup has been completed in the last 24 hours, and red if it hasn't.
How would I do this in Grafana with ES as the datasource?
Exact implementation depends on the used panel.
Example for singlestat: write ES query and then select Stat: Time of last point, you may need to select suitable unit/format:
Unfortunately, Grafana doesn't understand thresholds in your requested time format (older than 24 hours). You will need to return it as metric (for example as age of last backup in seconds) = you will need to write query for that. That means, that you will have 2 stats to show (last time + age), so you won't be able to use singlestat. Probably table panel will be better - you can use thresholding based on the age metric there.
In addition to the great answer by Jan Garaj, it looks like there is work being done to make this type of thing much easier in the future.
Check out this issue to check progress.

Grafana Alerting when there is no change in data for x minutes

Been rolling around the web and forums, cannot find a resource on this.
What I am to achieve is create an alert for when there is no change in data for a period of time.
We are monitoring openfiles for our webserver/s so this number fluctuates rather often. Noticed that when the number is stagnant it points to an issue on the server. So what we want is if openfile remains X for 2minutes alert us.
I made such an alert through a small succession of things:
I have an exclusive 'alerting dummy board', for all the alerts, since I can only have one alert per graph (grafana version 6.6.0)
I use the following query: avg_over_time(delta(Sensor_Data[1m])[20s:]) - this calculates the 20s average of 'first_value-last_value of 1min interval'
My data gathering program feeds into prometheus and this in turn into grafana -- if this program freezes, it might continue sending the last value to prometheus, and the above query will drop to strictly zero.
so I have an alert which goes off if the above query is within a range (-0.01, 0.01) for a minute (a typical value of the above query with system running is abs(query) > 0.18)
Thus, Grafana sends an alert if the Sensor_Data value does not change within about 2-3 minutes.
If you do use Prometheus and Alert manager, There is a nice function that worked for me.
changes
So using something like this in Alert manager will trigger if no changes for the time interval
changes(metric_name[5m]) = 0
This has worked for me. Make sure you're using a rate or increase function (no change means it will drop to zero) and filter the query like the following:
increase(metric_name) > 0
Then, in Alert Config, set "If no data or all values are null" to "Alerting". That way, when there's no data, the alert will be triggered.

National Weather Service (NOAA) REST API returns nil for parameters of forecast

I am using the NWS REST API as my weather service for an app I am making. I was initially reluctant to use NWS because of its bad documentation, but I couldn't resist as it is offered completely free.
Now that I am trying to use it, I am running into some difficulty. When making a request for multiple days, the minimum temperature appears nil for several days.
(EDIT: As I have been testing the API more I have found that it is not always the minimum temperatures that are nil. It can be a max temp or a precipitation, it seems completely random. If you would like to make test calls using their web interface, you can do so here: http://graphical.weather.gov/xml/sample_products/browser_interface/ndfdBrowserByDay.htm
and here: http://graphical.weather.gov/xml/sample_products/browser_interface/ndfdXML.htm)
Here is an example of a request the minimum temperatures are empty: http://graphical.weather.gov/xml/sample_products/browser_interface/ndfdBrowserClientByDay.php?listLatLon=40.863235,-73.714780&format=24%20hourly&numDays=7
Surprisingly, on their website, the minimum temperatures are available:
http://forecast.weather.gov/MapClick.php?textField1=40.83&textField2=-73.70
You'll see under the Minimum temperatures that it is filled with about 5 (sometimes less, it is inconsistent) blank fields that say <value xsi:nil="true"/>
If anybody can help me it would be greatly appreciated, using the NWS API can be a little overwhelming at times.
Thanks,
The nil values, from what I can understand of the documentation, here and here, simply indicate that the data is unavailable.
Without making assumptions about NOAA's data architecture, it's conceivable that the information available via the API may differ from what their website displays.
Missing values are represented by an empty element and xsi:nil=”true” (R2.2.1).
Nil values being returned seems to involve the time period. Notice the difference between the time-layout keys (see section 5.3.2) in 1 in these requests:
k-p24h-n7-1
k-p24h-n6-1
The data times are different.
<layout-key> element
The key is derived using the following convention:
“k” stands for key.
“p24h” implies a data period length of 24 hours.
“n7” means that the number of data times is 7.
“1” is a sequential number used to keep the layout keys unique.
Here, startDate is the factor. Leaving it off includes more time and might account for some requested data not yet being available.
Per documentation:
The beginning day for which you want NDFD data. If the string is empty, the start date is assumed to be the earliest available day in the database. This input is only needed if one wants to shorten the time window data is to be retrieved for (less than entire 7 days worth), e.g. if user wants data for days 2-5.
I'm not experiencing the randomness you mention. The folks on NOAA's Yahoo! Groups forum might be able to tell you more.

Ajax Security Question: Supplying Available usernames dynamically

I am designing a simple registration form in ASP.net MVC 1.0
I want to allow the username to be validated while the user is typing (as per the related questions linked to below)
This is all easy enough. But what are the security implications of such a feature?
How do i avoid abuse from people scraping this to determine the list of valid usernames?
some related questions: 1, 2
To prevent against "malicious" activities on some of my internal ajax stuff, I add two GET variables one is the date (usually in epoch) then I take that date add a salt and SHA1 it, and also post that, if the date (when rehashed) does not match the hash then I drop the request otherwise fulfill it.
Of course I do the encryption before the page is rendered and pass the hash & date to the JS. Otherwise it would be meaningless.
The problem with using IP/cookie based limits is that both can be bypassed.
Using a token method with a good, cryptographically strong, salt (say something like one of Steve Gibson's "Perfect Passwords" https://www.grc.com/passwords.htm ) it would take a HUGE amount of time (on the scale of decades) before the method could reliably be predicted and there for ensures a certain amount security.
you could limit the number of requests to maybe 2 per 10 seconds or so (a real user may put in a name that is taken and modify it a bit and try again). kind of like how SO doesn't let you comment more than once every 30 seconds.
if you're really worried about it, you could take a method above and count how many times they tried in a certain time period, and if it goes above a threshold, kick them to another page.
Validated as in: "This username is already taken"? If you limit the number of requests per second it should help
One common way to solve this is simply by adding a delay in the request. If the request is sent to the server, wait 1 (or more) seconds to respond, then respond with the result (if the name is valid or not).
Adding a time barrier doesn't really effect users not trying to scrape, and you have gotten a 60-requests per minute limit for free.
Bulding on the answer provided by UnkwnTech, which is some pretty solid advice.
You could go a step further and make the client perform some of calculation to create the return hash - this could just be some simple arithmatic like subtrating a few numbers, adding the data and multiplying by 2.
The added arithmatic does mean an out-of-box username scraping script is unlikely to work and forces the client into using up greater CPU.

Resources