How does Google Search Console detect pages with bad Web Vitals? - lighthouse

I can see in Google Search Console a group of 30 pages with need improvement LCP (longer than 2.5s). I checked all 20 listed URLs in Page Speed Insights and there's no Field Data but Lab Data LCP was less than 1s for each page.
Why does Google Search Console report pages without Field Data? Is it possible that in a group of 30 pages, only one is broken and others inherit bad LCP because the Field data is missed (each page has Origin LCP 2.6s)?

The Google Search Console (GSC) report is based on the field data which is sourced from CrUX. It tends to group pages when it does not have enough data.
Page Speed insights only reports on individual pages or the whole site (origin).
So GSC can fit a middle ground where it reports on groups of pages.

Related

Building a Twitter Search Box With Search Suggestions

I am developing a site that is integrated with Twitter content and I would like to enhance my search box providing search suggestions for hashtags and handles as the user types. Is there any way to get this autocomplete data generated from Twitter?
thanks().InAdvance();
There isn't anything in the Twitter API that does that. Besides, it wouldn't work either because the rate limits would never permit that type of interaction. e.g. you might have n queries in a 15 minute window. If you eat up that much rate limit, it leaves less to iterate through the rest of the results an support subsequent queries, leaving the user waiting until the next 15 minute window. I understand what you want to do, but 3rd party APIs, like Twitter, have very specific pre-defined functionality and don't work like a general purpose database.

When web crawling how do I find patterns of low quality urls and crawl these type of urls less?

Urls like exmpl.com/search.php?q=hey come in huge varieties of GET parameters and I want to classify such links to keep my crawler from crawling such "low priority" urls.
It depends on what you're crawling and what you want to do with it, if it's a few specific websites or a broad crawl. Sometimes the owners of the websites don't want you to crawl those URL's either because they generate additional traffic (traffic which is useless for both), and they may use the robots.txt file also for that. Give it a look (you should respect it anyway).
These low quality URLs, as you call them, may also happen with:
e-shops where you continuously add items to the cart and the back office gets messed up with the orders
blog platforms where you click on comments, replies, likes and so on with a weird result
crawler traps from calendars or other infinite URLs where only the parameters change but the page is the same
link farms, especially classified ads websites where each product or region is generated as a subdomain and you end up having thousands of subdomains for the same website; even if you have a limited number of URLs downloaded per website, this order of magnitude just takes over your crawl
If you have your contact on the user agent they sometimes contact you to stop crawling specific types of URLs, or to tune with you what should be crawled or not and how (as, for instance, the number of requests per second).
So, it depends on what you're trying to crawl. Look at the frontier and try to find weird behaviours:
hundreds of URLs for the same website where the URL is mostly the same and only one or a few parameters change
hundreds of URLs for blog platforms or e-shops where the parameters looks weird or keep repeating (look at those platforms and try to find patterns in them like (.*\?widgetType=.*) or (.*\&action=buy_now.*))
URLs that look like being from calendars
hundreds of URLs that interact with forms that you're not interested in submitting information to (like the search you mentioned)
a number of URLs for a website that is too high for what you'd expect for that website or a website of that type
a number of subdomains for a website that is too high
websites with a high number of 403, 404 or 500 or non-200 codes and which URLs are responsible for that
a frontier that doesn't stop growing and which websites and URLs are responsible for that (which URLs are being added so much that make it grow weirdly)
All those URLs are good candidates to be excluded from the crawl. Identify the common part and use it as a regular expression in the exclusion rules.

Viewing overall Fusion Tables account usage

Fusion tables has a 1GB/account limit. I have been unable to figure out where to view my accounts usage thus far, where is this information?
Edit: The google developers console does not display this information.
It's in your Google Cloud Console:
Quota: https://console.cloud.google.com/apis/api/fusiontables/quotas
If you have more than one project, make sure the desired project is selected in the top right of the window (in the blue header bar, there's a dropdown menu). Those links should get you to the general area.
Edit: The asker is correct in that Google Console does not have a location for an overall account usage or a tracking of bytes tranferred (Google just tracks API requests for Google Fusion Tables). The API limits are attached to each project rather than per Google account. You can view the quota of each API for an individual project on a single page: https://console.cloud.google.com/quotas?usage=ALL

Rails current visitor count

How does one implement a current visitors count for individual pages in Rails?
For example, a property website has a list of properties and a remark that says:-
"there are 6 people currently looking at this property" for each individual listing.
I'm aware of the impressionist gem, which is able to log unique impressions for each controller. Just wondering if there is a better way than querying
impressions.where("created_at <= ?", 5.minutes.ago).count
for each object in the array.
Before you get downvoted, I'll give you an idea of how to do it
Recording visitors is in the realm of analytics, of which Google Analytics is the most popular & recognized
Analytics
Analytics systems work with 3 parts:
Capture
Processing
Display
The process of capturing & processing data is fundamentally the same -- put a JS widget on your site to send a query to the server with attached user data. Processing the data puts it into your database
Displaying The Data
The difference for many people is the display of the data they capture
Google Analytics displays the data in their dashboard
Ebay displays the data as x people bought in the past hour
You want to show the number of people viewing an item
The way to do this is to hard-code the processing aspect of the data into your app
I can't explain the exact way to do this, because it's highly dependent on your stack, but this is the general way to do it

From a development perspective, how does does the indeed.com URL structure and site work?

On the webmaster's Q and A site, I asked the following:
https://webmasters.stackexchange.com/questions/42730/how-does-indeed-com-make-it-to-the-top-of-every-single-search-for-every-single-c
But, I would like a little more information about this from a development perspective.
If you search Google for anything job related, for example, Gastonia Jobs (City + jobs), then, in addition to their search results dominating the first page of Google, you get a URL structure back that looks like this:
indeed.com/l-Gastonia,-NC-jobs.html
I am assumming that the L stands for location in the URL structure. If you do a search for an industry related job, or a job with a specific company name, you will get back something like the following (Microsoft jobs):
indeed.com/q-Microsoft-jobs.html
With just over 40,000 cities in the USA I thought, ok, maybe it's possible they looped through them and created a page for every single one. That would not be hard for a computer. But then obviously the site is dynamic as each of those pages has 10000s of results and paginated by 10. The q above obviously stands for query. The locations I can understand, but they cannot possibly have created a web page for every single query combination, could they?
Ok, it gets a tad weirder. I wanted to see if they had a sitemap, so I typed into Google "indeed.com sitemap.xml" I got the response:
indeed.com/q-Sitemap-xml-jobs.html
.. again, I searched for "indeed.com url structure" and, as I mentioned in the other post on webmasters, I got back:
indeed.com/q-change-url-structure-l-Arkansas.html
Is indeed.com somehow using programming to create a webpage on the fly based on my search input into google? If they are not, how are they able to have a static page for millions and millions and millions possible query combinations, have them dynamically paginate, and then have all of those dominate google's first page of results (albeit that very last question may be best for the webmasters QA)?
Does the javascript in the page somehow interact with the URL
It's most likely not a bunch of pages. The "actual" page might be http://indeed.com/?referrer=google&searchterm=jobs%20in%20washington. The site then cleverly produces a human readable URL using URL rewrite, fetches jobs in the database that matches the query, and voĆ­la...
I could be dead wrong of course. Truth be told, the technical aspect of it can probably be solved in a multitude of ways. Every time a job is added to the site, all pages that need to be done to match that job, might be created, thus producing an enormous amount of pages for Google to crawl.
This is a great question however remains unanswered on the ground that a basic Google search using,
ste:indeed.com
returns over 120MM results and secondly a query such as, "product manager new york" ranks #1 in results. These pages are obviously pre-generated which is confirmed by the fact the page is cached by the search engine (sometimes several days before) has different results from a live query on the site.
Easy when Googles search bot crawls the pages on indeed or any other job search site those page are dynamically created. Here is another site: http://jobuzu.co.uk i run this which is similar to how indeed works.
PHP is your friend in this and Indeed don't just use standard databases look into Sphinx and Solr as they offer Full text search for better performance then MySql etc.
They also make clever use of rel="canonical" and thorough internal linking:
http://www.indeed.com/find-jobs.jsp
Notice that all the pages that actually rank can be found from that direct internal link structure.

Resources