How do Aweber and FeedBlitz report subscriber numbers to Feedburner? - parsing

I've looked all over for some documentation on this, but haven't found it. Some posts reference a user-agent string:
http://groups.google.com/group/feedburner-services/browse_thread/thread/7aee14cf6a2432e7/49464335d2228e25?lnk=gst&q=aweber#49464335d2228e25
I had assumed there would be an API or something. More generally, how does ANY rss feed reader/aggregator (like Bloglines, etc) report subscriber numbers to Feedburner?
I'm working on developing a new app that would need this functionality.
Thanks for your help!
Brian

As you discovered in your link, you put the subscriber count in your user-agent, then you contact the Feedburner Support Group and tell them what format you will be using.
The consensus format is something
User-agent: Service Name (http://example.com/service/info/; ### subscribers ; [optional feed identifier] )
The optional feed identifier is typically used if you run several different services, and fetch the feed separately for each one; e.g. if you have a mail service and a web-based reader service, with different subscribers, then you might either use:
User-agent: SO Agg/1.3 (http://example.com/SOAgg ; 5000 subscribers ; feed-id=mail-134 )
on request for the mailer, and
User-agent: SO Agg/1.3 (http://example.com/SOAgg ; 2000 subscribers ; feed-id=web-134 )
on the request for the website; or use
User-agent: SO Agg/1.3 (http://example.com/SOAgg ; 7000 subscribers ; )
if your system makes only one request for both services...
You will usually need to specify what IP addresses are authorised to request the feed with that user-agent, as well.

Many major aggregators report user stats by including them as part of the useragent string. Examples:
Bloglines reporting description in blog comment
Google Reader: Tips for Publishers
PostRank: Reporting Subscription Counts
There's no standard for this at this time.
To the best of my knowledge, folks will contact major feed analytics vendors like Feedburner directly, to make sure their useragent-based reporting is being counted.

Related

Gmail API, Reply to thread not working / forwarding

I'm using the google gmail api in swift. All is working well, it's compiling etc.
I'm now trying forward an email, the only way I see this possible so far is by using a thread id.
So I'm using the API tester found here to send tests. Will will focus on this. It can be found here1
So I've input this, the "raw" is Base64 URL encoded string.
{
"raw": "VG86ICBlbWFpbFRvU2VuZFRvQGdtYWlsLmNvbSAKU3ViamVjdDogIFRoZSBzdWJqZWN0IHRlc3QKSW4tUmVwbHktVG86ICBteUVtYWlsQGdtYWlsLmNvbQpUaHJlYWRJZDogIDE1YjkwYWU2MzczNDQ0MTIKClNvbWUgQ29vbCB0aGluZyBpIHdhbnQgdG8gcmVwbHkgdG8geW91ciBjb252by4u",
"threadId": "15b90ae637344412"
}
The "raw" in plain text is
To: emailToSendTo#gmail.com
Subject: The subject test
In-Reply-To: myEmail#gmail.com
ThreadId: 15b90ae637344412
Some Cool thing i want to reply to your convo..
when I execute it I get this back
{
"id": "15b944f6540396df",
"threadId": "15b90ae637344412",
"labelIds": [
"SENT"
]
}
But when I check both email account, from and to. None of them say the previous messages but are in the same "thread" or convo.
If anyone can help it would be much appreciated I've spent all day on this issue and half of yesterday and did TONS of research on it.
as stated here I should I'm adding the threaded and In-Reply-To in the right way I believe
The ID of the thread the message belongs to. To add a message or draft to a thread, the following criteria must be met:
The requested threadId must be specified on the Message or Draft.Message you supply with your request.
The References and In-Reply-To headers must be set in compliance with the RFC 2822 standard.
The Subject headers must match.

How to set Vendor Tax ID and 1099 Eligibility in API?

I'm currently using Consolibyte's PHP QB classes to interface with the QB api.
I've been successfully creating and updating Vendor's in QB for a while. However, we have a new requirement to use the API to store vendor's tax information.
I've tried to lookup the correct syntax to set these, but have been unsuccessful thus far.
My most recent attempt was:
$Vendor->setVendorTaxIdent($provider->taxId);
$Vendor->setIsVendorEligibleFor1099(true);
The rest of the information set gets updated properly, and the return from
$result = $VendorService->update($this->context, $this->realm, $provider->vendorId, $Vendor);
seems to indicate success.
Please let me know if you need anymore context. Thanks!
Have you referred to the documentation?
https://developer.intuit.com/docs/api/accounting/Vendor
The documentation indicates:
TaxIdentifier: String, max 20 characters
Vendor1099: Boolean
The geters and seters exactly mirror the documented fields. So unsurprisingly, you'll have these methods:
$Vendor->setTaxIdentifier($string);
$string = $Vendor->getTaxIdentifier();
And:
$Vendor->setVendor1099($boolean);
$boolean = $Vendor->getVendor1099();
If you continue to have trouble, make sure you post the XML request you're sending to QuickBooks. You can get this by doing:
print($VendorService->lastRequest());
print($VendorService->lastResponse());

How do I fetch orgusers beyond the 1st 100 using the Google Apps OrgUser provisioning API feed

I am using Zend's gdata library for the Google Apps provisioning API. Since Zend doesn't yet support fetching org users (no retrieve function provided by the library for this feed), I am making a custom gdata query to the url (as suggested in the documentation developers.google.com/google-apps/provisioning/#retrieving_organization_users_experimental):
apps-apis.google.com/a/feeds/orguser/2.0/'.$customerId.'?get=all
This works well for <= 100 users.
Now, I have created a domain with 125 users across 5 OUs. When I fetch the above URI, I get the 1st 100 users (as documented and expected). However, I could not find the pagination link mentioned here: developers.google.com/google-apps/provisioning/reference#Results_Pagination
Here's the start of my orguser feed:
<?xml version='1.0' encoding='UTF-8'?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:apps='http://schemas.google.com/apps/2006'><id>https://apps-apis.google.com/a/feeds/orguser/2.0/C00xxxxxxx</id><updated>2013-01-06T08:17:43.520Z</updated><**link rel='next' type='application/atom+xml' href='https://apps-apis.google.com/a/feeds/orguser/2.0/C00xxxxxxx?get=all&startKey=RASS03jtnz0s2orxmbn.**'/><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='https://apps-apis.google.com/a/feeds/orguser/2.0/C00xxxxx'/>
I tried the https://apps-apis.google.com/a/feeds/orguser/2.0/C00xxxxxxx?get=all&startKey=RASS03jtnz0s2orxmbn. link but it gives me the exact same 100 users that the https://apps-apis.google.com/a/feeds/orguser/2.0/C00xxxxxxx?get=all link gives. This is the only occurrence of the word "next" in my feed and so there is not other URI I can try to fetch the next 25 users.
So I have only been able to get 100 users from this API call. How do I go about fetching the next 25 users? Examples/code would be really appreciated. Or what am I doing wrong?
Please help - this is blocking an urgent delivery.
Thanks!,
Vinay.
Your 2nd request should look like:
https://apps-apis.google.com/a/feeds/orguser/2.0/C00xxxxxx?startKey=RASS03jtnz0s2orxmbn&get=all
startKey should be set to the value of the next parameter and get should continue to be all for each page request.
Also, make sure the URL is decoded, if & is encoded as & in the request, then Google's servers will see all of all&startKey=RASS03jtnz0s2orxmbn as the value of get and it won't see a startKey parameter at all.

How to block bad unidentified bots crawling my website?

How can I resist the bad unidentified bots to crawl my website? Some bad bots whose name is not present in cPanel of Apache are badly accessing my website bandwidth.
I had tried robots.txt on batgap.com/robots.txt and also blocked with .htaccess but there is no improvement in bandwidth usage. I don't know the IP of those bots so unable to block them by IP address. These bots are consuming too much bandwidth of site and hence a result I need to increase it from server.
I'm from Incapsula and we deal with bad bots on a regular basis.
We've recently release a bot-related research that provides insights of the scope of the problem ( http://www.incapsula.com/the-incapsula-blog/item/225-what-google-doesnt-show-you-31-of-website-traffic-can-harm-your-business ) and in light of this data I have to agree with #Leonard Challis - you simply can not handle bot protection manually.
Having said that, there are bot protection solutions, even Free ones (us included) that can help you with bad bots.
BTW - Just like you mentioned, one byproduct of bad bots visits is a loss of bandwidth.
We`ve recently became aware of just how surprisingly HUGE bot-related bandwidth usage really is.
This is an interesting topic by itself.
We believe that by avoiding bad bot traffic, hosting providers can actually greatly improve their efficiency (hopefully using this to drop cost or to improve services). Once you imagine Social and Business implication of this you can understand the real scope of this bad bot problem that goes way beyond the immediate damage done.
I block 'bad bots' by using PHP.
I filter in IP address primarily, then by User-Agent secondarily.
I make the 'bad bot' wait for up to 999 seconds, then return a very small web page.
Usually (always) the internet connection times-out and zero (0) bytes are returned.
Best of all I have delayed them for a few minutes before the get to the next victim.
http://gelm.net/How-to-block-Baidu-with-PHP.htm
Unfortunately robots.txt is sometimes ignored by these "bad bots", though if the problem is more things like genuine search engine spiders that you don't want to see they ought to take it in to account. I presume with CPanel you can get in to the web server (apache) logs? In there you can look for two things: the IP and the User-Agent. You can find the culprits in there and add them to your robots.txt and .htaccess. Note that .htaccess rules denying IP addresses are far better that just relying on robots.txt because you are taking the choice out of the bot creator's hands.
If you know specific bots which are doing this you should be able to get IP addresses and user-agents from forums, but if it's a more general thing then really I'm afraid it's more of a manual job.
There are other methods that can be used with varying effect, such as mod_security (http://www.askapache.com/htaccess/modsecurity-htaccess-tricks.html) but this will mean you'll have to access your web server configuration.
Finally, you can check the links that are pointing to your web site (using the link: option on google). Sometimes if you have links on spammy forums or the like this can increase the chances of bots coming to get you. Maybe you can look at the referer URL in the apache logs - but this is all based on a lot of presumptions and you'd probably be lucky if it had a great effect.
Block Unwanted Robots/Spiders visitors via PHP
Instructions:
Place the following PHP Code in the beginning of your index.php file.
The idea here is to place the code in the main site's PHP home page, the main entry point of the site.
If you have other PHP files that are accessed directly via an URL (not including PHP include or require support type files), then place the code in the beginning of those files.
For most PHP sites and PHP CMS sites, the root's index.php file is the file that is the main entry point of the site.
Keep in mind that your site statistics, i.e. AWStats, will still log the hits under Unknown robot (identified by 'bot' followed by a space or one of the following characters _+:,.;/-), but these bots will be blocked from accessing your site's content.
<?php
// ---------------------------------------------------------------------------------------------------------------
// Banned IP Addresses and Bots - Redirects banned visitors who make it past the .htaccess and or robots.txt files to an URL.
// The $banned_ip_addresses array can contain both full and partial IP addresses, i.e. Full = 123.456.789.101, Partial = 123.456.789. or 123.456. or 123.
// Use partial IP addresses to include all IP addresses that begin with a partial IP addresses. The partial IP addresses must end with a period.
// The $banned_bots, $banned_unknown_bots, and $good_bots arrays should contain keyword strings found within the User Agent string.
// The $banned_unknown_bots array is used to identify unknown robots (identified by 'bot' followed by a space or one of the following characters _+:,.;/\-).
// The $good_bots array contains keyword strings used as exemptions when checking for $banned_unknown_bots. If you do not want to utilize the $good_bots array such as
// $good_bots = array(), then you must remove the the keywords strings 'bot.','bot/','bot-' from the $banned_unknown_bots array or else the good bots will also be banned.
$banned_ip_addresses = array('41.','64.79.100.23','5.254.97.75','148.251.236.167','88.180.102.124','62.210.172.77','45.','195.206.253.146');
$banned_bots = array('.ru','AhrefsBot','crawl','crawler','DotBot','linkdex','majestic','meanpath','PageAnalyzer','robot','rogerbot','semalt','SeznamBot','spider');
$banned_unknown_bots = array('bot ','bot_','bot+','bot:','bot,','bot;','bot\\','bot.','bot/','bot-');
$good_bots = array('Google','MSN','bing','Slurp','Yahoo','DuckDuck');
$banned_redirect_url = 'http://english-1329329990.spampoison.com';
// Visitor's IP address and Browser (User Agent)
$ip_address = $_SERVER['REMOTE_ADDR'];
$browser = $_SERVER['HTTP_USER_AGENT'];
// Declared Temporary Variables
$ipfound = $piece = $botfound = $gbotfound = $ubotfound = '';
// Checks for Banned IP Addresses and Bots
if($banned_redirect_url != ''){
// Checks for Banned IP Address
if(!empty($banned_ip_addresses)){
if(in_array($ip_address, $banned_ip_addresses)){$ipfound = 'found';}
if($ipfound != 'found'){
$ip_pieces = explode('.', $ip_address);
foreach ($ip_pieces as $value){
$piece = $piece.$value.'.';
if(in_array($piece, $banned_ip_addresses)){$ipfound = 'found'; break;}
}
}
if($ipfound == 'found'){header("location: $banned_redirect_url"); exit();}
}
// Checks for Banned Bots
if(!empty($banned_bots)){
foreach ($banned_bots as $bbvalue){
$pos1 = stripos($browser, $bbvalue);
if($pos1 !== false){$botfound = 'found'; break;}
}
if($botfound == 'found'){header("location: $banned_redirect_url"); exit();}
}
// Checks for Banned Unknown Bots
if(!empty($good_bots)){
foreach ($good_bots as $gbvalue){
$pos2 = stripos($browser, $gbvalue);
if($pos2 !== false){$gbotfound = 'found'; break;}
}
}
if($gbotfound != 'found'){
if(!empty($banned_unknown_bots)){
foreach ($banned_unknown_bots as $bubvalue){
$pos3 = stripos($browser, $bubvalue);
if($pos3 !== false){$ubotfound = 'found'; break;}
}
if($ubotfound == 'found'){header("location: $banned_redirect_url"); exit();}
}
}
}
// ---------------------------------------------------------------------------------------------------------------
?>

How to get twitter server time?

So I'm trying to build a real time monitoring tool for twitter key words using tweet sharp. I'm using the search API to collect queries every 10-15 seconds. When I make the calls, I only want to collect tweets that have appeared since the pervious update.
var twitter = FluentTwitter.CreateRequest().AuthenticateAs("username", "password").Search().Query().Containing("key word").Take(1000);
var response = twitter.Request();
currentResponseDateTime= Convert.ToDateTime(response.ResponseDate);
var messages = from m in response.AsSearchResult().Statuses
where m.CreatedDate > lastUpdateDateTime
select m;
lastUpdateDateTime = currentResponseDateTime;
My issue is that the twitter server time is different from the client times by a few seconds. I looked around and tried to get the datetime I recieved the response from the Response.ResponseDate property, but it looks like that is set based on the local computer time. I.e currentResponseDateTime is a few seconds ahead of the Twitter Server time. So I end up not collecting a few of the tweets.
Does anyone know how I can get the current server time from twitter search or REST API?
Thanks
I'm not sure how you would get the local server time of the twitter service, but one approach you could take is to store the date of the most recent twitter update seen in the "lastUpdateDateTime" field. That way, you're guaranteed to get all the messages since the last one you saw, regardless of the offset of the twitter server.
var twitter = FluentTwitter.CreateRequest().AuthenticateAs("username", "password").Search().Query().Containing("key word").Take(1000);
var response = twitter.Request();
currentResponseDateTime= Convert.ToDateTime(response.ResponseDate);
var messages = from m in response.AsSearchResult().Statuses
where m.CreatedDate > lastUpdateDateTime
select m;
lastUpdateDateTime = messages.Select(m => m.CreatedDate).Max();
Another approach (and one that Twitter recommends) is to pull the Date header from their API server's response, which provides Twitter's notion of time in GMT. This assumes that you can access the server response headers, and that depends on the method you're using to access the API.
For example, hitting https://api.twitter.com/1/help/test.json
$ lynx --dump --head https://api.twitter.com/1/help/test.json
HTTP/1.0 200 OK
Date: Tue, 22 Jan 2013 13:30:36 GMT
...
Reference: how to get the twitter server time (synchronize)? on dev.twitter.com support site.
Quoting Taylor Singletary:
The current time that Twitter "thinks" it is is returned in the "Date" HTTP header of every response to an API call you make. You can also issue a simple HTTP HEAD request to GET help/test to get the header as an initial syncing step for your app.

Resources