Kimono Scrape Remains "In Progress" - google-sheets

I am having issues with Kimono Labs. Every scrape I run will run indefinitely without throwing an error or completing. Occasionally, the scrapes will randomly start working days in the future without any changes on my behalf - only to fail a few days later. I love Kimono because it is so easy to integrate with Google Sheets for friends to alter the data, but this has become problematic. There doesn't seems to be any related help in the Kimono help data for an issue such as this.
One of my scrapes is not behind a paywall and the other is. One is set to run daily and the one behind the paywall is set to run hourly.
What steps can I take to troubleshoot this error and get the ball rolling again?

I had a very simple API doing the exact same thing for weeks!
I'm only using a free account so I didn't have any support but I ended up sending a bug report at https://www.kimonolabs.com/support .
Strangely enough, the very next day, the API started working normally again (and has ever since). I assume they looked into it and fixed whatever was stopping my crawl from completing.

Related

Frequently refreshing web page during long-running process

I've been hunting around my issue for a while, probably the best I've come up with is another Stack Overflow question: How should I perform a long-running task in ASP.NET 4?
I'm in a similar place in that I'm wanting to understand what my options are, but I don't feel I know enough specifically about MVC to come to a view. I'm using MVC 5 but with the 4.8 framework, plus I note that technologies such as SignalR have become available since this question was asked. I was wondering if any experienced MVC'ers could give me a view?
I too have a long running process. More specifically, the user is importing a file. The file is delimited so the import happens line by line. The file might be thousands of lines long. Each line will be parsed and imported in a fraction of a second but the whole operation might take several minutes.
I don't particularly need behaviour to be asynchronous, but because of the length of the entire process I want to regularly update the user on progress. I'm wondering what options I have?
I've got a vague recollection that I might have looked at this problem 20-odd years ago (Classic ASP), and solved it by regular flushes, sending a bit more of the page to the client every few seconds, but I'm trying also to use a _Layout page now, so I've sent the page back already. So I don't think I have that option, even assuming such a mechanism still exists. A bit more recently, but still a while ago, I might have used javascript to poll but everything I'm reading now seems to point me to newer technologies which I'm not sure I fully understand yet.
I'm just wondering how would you solve this problem?
I would not be performing any of the file parsing on the web server, especially if it's thousands of rows long. I would delegate this to a background service of sorts, whether that be a Lambda service in the cloud or a Windows service or a scheduled task. You could then call your SignalR hub from the background task (whatever that might be) to update the progress of the import.

Watson Retrieve and Rank Questions Upload Failed

I've been working on a project involving the Watson Retrieve and Rank service and it was acting normally until now. I managed to upload a number of documents and created roughly 50 questions to start off. Normally, I was able to upload the questions just fine, but now I keep getting an error saying "Questions upload Upload failed".
I have attempted to use different browsers and going into incognito mode, yet nothing seems to solve the issue. I either get the error or the upload questions animation plays endlessly.
This is what it looks like as I try to upload the questions
If anyone could give some insight on how to approach this problem, it would be great.
Can you provide the entire error log?
Are you sure the solr cluster and collection are created correctly? The Standard Plan for this service only allow 7 rankers in the free plan.
You can try it with a new instance of the service.
Are you sure your training data meet the requirements?
Training data requirements:
https://www.ibm.com/watson/developercloud/doc/retrieve-rank/training_data.html
Retrieve and Rank wasn't working correctly on Wednesday and Thursday. But today its up and running properly.

YouTube API v3 Request still counting after stopped

I have a YouTube API Key, and I was testing it out when I started watching my [requests] through the dashboard?project=myproject-111111&duration=PT1H
The issue is this. I stopped using the key about 15 minutes ago, and that blasted thing is STILL counting. Since I stopped using it, it has gone from
9,875 -to- 39,978 (And still counting)
Why would this be still counting for? The key is NOT being used, but its counting.
You are allowed 1 million requests per 24 hour day. And at this rate, I will be there in no time flat.
I have tried to find an active Forum for the YouTube API, and there is none. The only ones I found had their last post in 2012 and 2014.
Any idea's why this thing is still counting? (46,333)
Updated 10:47 pm EST: On the [Quotas] page. It is over 300,000+ and counting. This is a blasted joke. I reported it in as a bug, but the bug reporting page, is so full of SPAM, that it makes you wonder on rather they are going to check it regularly or not.
Updated 10:51 pm EST: It finally stopped. Conflicting count returns are coming in from the different pages.
118,762 on the dashboard
415,489 on the YouTube Quotas page.
I went to the YouTube Developers Twitter page and tweeted to them about the issue, and am awaiting a reply back.
I will post here, once I get a reply back.
Is this a BUG?
Or, do other issue?
CodingEE
This is something that I was looking at. As quota is eaten up by api calls that the site is not actually making.
This is unfortunately something for Google to fix. I did send a request to look into it, but no reply.
The cause is "BOTS". They will search, index and run the page affectively to index and rate it. As such, the will incur api usage.
This can be reduced in a few ways.
1: If in testing stage, and you do not use the online version too often apart from testing updates to code, then you can remove the api key from the code while not testing.
2: If it is live and running, but you wish to stop the bots eating the quota, then in your Google dashboard, block the bots.
This will however stop indexing. So I would recommend "trying" to limit them and how often they inex.
Aside from that, we need to get Google to (if possible) not use quota for bots.

CloudKit 'Unexpected Server Error' Anytime Manual Operations Performed in Dashboard

I have been developing an iOS app that utilizes the CloudKit feature available for Apple Developers. I've found it to be a wonderful resource, especially since the very day I started designing my backend, the service I was intending to use (Parse) announced it was shutting down. It's very appealing due to it's small learning curve, but I'm starting to notice some annoying little issues here and there so I'm seeking out some experts for advice and help. I posted another CloudKit question a couple days ago, which is still occurring: CloudKit Delete Self Option Not Working. But I want to limit this to a different issue that may be related.
Problem ~ Ever since I started using CloudKit I have noticed that whenever I manually try to edit (delete an entry, remove or add part of a list, even add a DeleteSelf option to a CKReference after creation), and then try to save the change, I get an error message and cannot proceed. Here is a screenshot of the error window that appears:
It's frustrating because anytime I want to manipulate a record to perform some sort of test, I either have to go do it through my app, or just delete the record entirely and create a new one (that I am able to do without issue). I have been just working around this issue for over a month now because it wasn't fatal to my progress. However, I am starting to think that this could be related to my other CloudKit issues, and maybe if I could get some advice on how to fix it I could also solve my other problems. I have file numerous bug reports with Apple, but haven't received a response or seen any changes.
I'd also like to mention that for a very long time now (at least a few days), I've noticed down in the bottom left hand corner of my Dashboard that it is consistently saying that it's "Reindexing Development Data". I remember at first that wasn't an issue, I would get that notification after making a change but it'd go away after the operation is complete. Now it seems to be stuck somewhere inside the process. And this is a chronic issue, it's saying this all the time, even right when I log into my dashboard.
Here is what I'm talking about:
As time goes on I find more small issues with CloudKit, I'm concerned that once I go into production more problems could start manifesting and then I could have a serious issue. I'd love to stick with CloudKit and avoid the learning curve of a different service like Amazon Web Services, but I also don't want to set myself up for failure.
Can anyone help me with this issue, or has anyone else experienced it on a regular basis? Thanks for the advice and help!
Pierce,
I found myself in a similar situation; the issue seemed to be linked to Assets; I had an Asset in my record definition. I and several other I noted reported the re-indexing issue on the apple support website and after about a month it eventually disappeared.
Have you tried resting your database schema completely, snapshot the definition; since you zap it completely and than reset, see inset.
Ultimately I simply created a new project, linked it to cloud kit and use the new container in my original app.

Iphone app that needs to scrape a website once every day

So I'm making an iphone application that needs to scrape a website once everyday.
What I'm going to scrape is a table of upcoming games for that same day for a soccer division. Thats why i need the app to scrape from the same page and same table once everyday to keep the upcoming games updated.
I was referred to import.io but they didn't have something like a schedule re-crawl.
I would love to get some ideas and tips to how i should do this since I'm stuck now.
You might take a look at https://www.kimonolabs.com/
I played around with the service a while back and was impressed with how easy it way to set up. They have a "free" option so long as the APIs you create are not private.
Oh, and I agree with Paul, screen scraping is not something the iOS client should be doing. Too fragile, and when (not if) something breaks, you will need to go through an Apple review process to fix it.
This doesn't seem like something an app should do, your server should do it (so that the scraping is only performed once), and your clients can retrieve it from your server. That also means you could send out push notifications for important fixtures etc. Maybe that's what you meant, anyway.
If it's on the server you can just setup a scheduler (in Java, for example) to run once every x hours (probably a smaller number than 24 assuming you don't know when the website is to be updated). Then your app can just get the latest list of fixtures from your server on startup, pull-to-refresh, etc. Presumably someone will open your app, look at the fixtures, then come out of your app - so it doesn't seem like you need to cover the case where someone is in your app all day, but if you did you could use NSTimer to run every x minutes after the initial on-startup server call.

Resources