How to properly "OR" word groups using Twitter's advanced search? - twitter

I'm trying to do an advanced search to get tweets that MUST include one word from the following three word groups: html or css, developer or engineer, home or remote.
I've read the Twitter's documentation and the query I should be using is:
html OR css developer OR engineer home OR remote
And I also tried:
html OR css AND developer OR engineer AND home OR remote
https://twitter.com/search?f=tweets&vertical=default&q=html%20OR%20css%20developer%20OR%20engineer%20home%20OR%20remote&src=typd
I'm getting inaccurate results, it's showing tweets that don't have at least one word from each word group:
Where is the issue? I've contacted Twitter's support but they don't respond to individual reports :/
ATTENTION: I don't want results from the Top tab. The Top tab only shows popular tweets. The Live tab shows all the tweets ands that's what I want. https://support.twitter.com/articles/131209

Just put quotes around the words.
"html" OR "css" AND "developer" OR "engineer" AND "home" OR "remote" seems to do exactly what you want.

You need to put brackets around the terms you wish to group. For example:
html OR css AND developer OR engineer AND home OR remote
becomes
(html OR css) AND (developer OR engineer) AND (home OR remote)
Example

You need to put brackets around each group. Also "AND" should be replaced with a space. Try this:
(html OR css) (developer OR engineer) (home OR remote)

According to Twitter's Search API:
Before getting involved, it’s important to know that the Search API is
focused on relevance and not completeness. This means that some Tweets
and users may be missing from search results. If you want to match for completeness you should consider using a Streaming API instead.
So that's why the search results are "inaccurate". I ended up creating a Node.js script that uses the Streaming API and filters the tweets I want.

Related

Get all urls indexed to google from a website

I want a program that does that, from a website, get all the urls indexed to it with a good output, like, all the urls line by line, and get the urls not used in the website (because a spider can already do that).
I have been searching and finding sloopy options, what I want is accurate and simple: INPUT: URL OUTPUT: ALL THE URLS.
I don't know such applications for now, but I'll try to simplify your task by dividing it:
Yon need a list of your website's internal links. Any webcrawler tool can do that.
You need a list of your website's pages indexed by Google. There are a lot of SE index checkers, you can google it.
Compare 2nd list to the 1st one, and find all the links presents in Google's index but missing on your website.

Url with pseudo anchors and duplicate content / SEO

I have a product page with options in select list (ex : color of the product etc...).
You accede to my product with different urls :
www.mysite.com/product_1.html
www.mysite.com/product_1.html#/color-green
If you accede with the url www.mysite.com/product_1.html#/color-green, the option green of the select list is automatically selected (with javascript).
If i link my product page with those urls, is there a risk of duplicate content ? Is it good for my seo ?
thx
You need to use canonical urls in order to let the search engines know that you are aware that the content seems duplicated.
Basically using a canonical url on your page www.mysite.com/product_1.html#/color-green to go to www.mysite.com/product_1.html tells the search engine that whenever they see www.mysite.com/product_1.html#/color-green they should not scan this page but rather scan the page www.mysite.com/product_1.html
This is the suggested method to overcome duplicate content of this type.
See these pages:
SEO advice: url canonicalization
A rel=canonical corner case
At one time I saw Google indexing the odd #ed URL and showing them in results, but it didn't last long. I think it also required that there was an on page link to the anchor.
Google does support the concept of the hashbang (#!) as a specific way to do indexable anchors and support AJAX, which implies an anchor without the bang (!) will no longer be considered for indexing.
Either way, Google is not stupid. The the basic use of the anchor is to move to a place on a page, i.e. it is the same page (duplicate content) but a different spot. So Google will expect a #ed URL to contain the same content. Why would they punish you for doing what the # is for?
And what is "the risk of duplicate content". Generally, the only onsite risk from duplicate content is Google may waste it's time crawling duplicate pages instead of focusing on other valuable pages. As Google will assume # is the same page it is more likely to not event try the #ed URL.
If you're worried, implement the canonical tag, but do it right. I've seen more issues from implementing it badly than the supposed issues they are there to solve.
Both answers above are correct. Google has said they ignore hashtags unless you use hash-bang format (#!) -- and that really only addresses a certain use case, so don't add it just because you think it will help.
Using the canonical link tag is the right thing to do.
One additional point about dupe content: it's less about the risk than about a missed opportunity. In cases where there are dupes, Google chooses one. If 10 sites link to your site using www.example.com and 10 more link using just example.com you'll get the :link goodness" benefit of only 10 links. The complete solution to this involves ensuring that when users and Google arrive at the "wrong" on, the server responds with an HTTP 301 status and redirects the user to the "right" one. This is known as domain canonicalization and is a good thing for many, many reasons. Use this in addition to the "canonical" link tag and other techniques.

How to display my website in a search engine with menus and links displayed in the search results

If you search for "richfaces" in google.com, the first result will be about www.jboss.org/richfaces. You may watch there that links (menus) like "Downloads", "Demos", "Documentations" are also displayed. How to have these links displayed in the search results?
(The "description" meta tag not enough I hope)
You are not able to make Google show links to your site (they will do this if they deem your site is relevant enough to warrant providing this feature). However, you can remove these links if they are present, if they are inappropriate.
See http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=47334 for more details.
These are called Google Site Links. Google is pretty tight-lipped about how this feature is automated, but there are a handful of HTML5 tags which are supposed to help make search engines smarter. You can read more about them at O'Reilly's Dive Into HTML5 website. Especially interesting are the "Google Rich Snippets", though they're not exactly what you're looking for.
It might help to put those links in the HTML5 nav tags, like
<nav>Home About FAQ</nav>
and I've heard it tossed around that the site navigation should be an unordered list, but I don't know how true that is. Still, it couldn't hurt to do it that way and style the list with CSS.

What does "?ref=ts" mean in a Facebook application URL?

When Facebook drives traffic to an application, it often append &ref=whatever to the query string. This is useful for figuring out which integration points are working or not. I've figured out what some of these mean. For example:
ref=bookmarks - the user clicked on a bookmark.
ref=game_my_recent - the user clicked on the upper portion of the games dashboard.
What does "ref=ts" mean? It accounts for a ton of traffic. I've viewed source on pages all over common Facebook pages and cannot find a match for ant piece of content generated by any of my applications.
Same question, posted by me on the Facebook developer forum:
http://forum.developers.facebook.com/viewtopic.php?id=54866
It means 'Top Search' (if you enter a query into the top, and then click on something, it will append ref=ts
As noted, ref=ts is appended to the url whenever a user makes a search in the Top Search input field.
Also note that people tend to copy/paste links in their website and blogs, without trimming useless GET strings.
So it is possible if you get a high number of referrers coming from the top search that they are in fact links that propagate outside of Facebook.

Can the Google Search appliance generate a report showing broken links on your site?

I know the Google Search Appliance has access to this information (as this factors into the PageRank Algorithm), but is there a way to export this information from the crawler appliance?
External tools won't work because a significant portion of the content is for a corporate intranet.
Might be something available on Google, but I have never checked. I usually use the link checker provided by W3C. The W3C one can also detect redirects which is useful if your server handles 404s by redirecting instead of returning a 404 status code.
You can use Google Webmaster Tools to view, among other things, broken links on your site.
This won't show you broken links to external sites though.
It seems that this is not possible. Under Status and Reports > Crawl Diagnostics there are
2 styles of report available: the directory drill-down 'Tree View'
and the 100 URLs at a time 'List View'. Some people have tried creating programs to page through the List View
but this seems to fail after a few thousand URLs.
My advice is to use your server logs instead.
Make sure that 404 and referrer URL logging are enabled on your web server,
since you will probably want to correct the page containing the broken link.
You could then use a log file analyser to generate a broken link report.
To create an effective, long-term way of monitoring your broken links, you may want to set up a cron job to do the following:
Use grep to extract lines containing 404 entries from the server log file.
Use sed to remove everything except requested URLs and referrer URLs from every line.
Use sort and uniq commands to remove duplicates from the list.
Output the result to a new file each time so that you can monitor changes over time.
A free tool called Xenu turned out the be the weapon of choice for this task. http://home.snafu.de/tilman/xenulink.html#Download
Why not just analyze your webserver logs and look for all the 404 pages? That makes far more sense and is much more reliable.
I know this is an old question but you can use the Export URLs feature on the GSA admin console then look for URLs with a state of not_found. This will show you all the URLs that the GSA has discovered but returned it a 404 when it attempted to crawl them.

Resources