Trying to search reddit submissions and subreddits based on comment word search - reddit

Can you search for submissions (over all subreddits) to find those having a particular keyword use in the discussions i.e comments of that submission?
I am trying to :
Collect all the submission ids and hence the comments that have this word say "awesome".
I would also like to know if there are some subreddits that are likely to have this word used often based on the comments?
I have already looked at PRAW & PSAW documentation and even found the exact solution to this problem i.e the aggs paramter but apparently it is not working at the moment? Every query gives me a JSON that is empty(with the same code as in the documentation)?
I was wondering if there are alternatives or workarounds to this? Or something that could solve my problem.
Thanks in advance for your time.

Related

Solved: Extract date from my Substack webpage to Google Sheets

longtime lurker, first-time poster. I usually solve my issues & upvote without needing to post, but I've been stumped all weekend!
Edit: Erik solved it:
I was looking for an answer to extract the "datePublished" or "dateModified" from a Substack article in a Google Sheet.
Goal: This will tell me when it was the last date/time I updated, for example, my PS5 restock guide, my Walmart PS5 restock guide, etc. If it's too stale, I try to add relevant information. Having it in Google Sheets makes it streamlined as there are dozens of guides.
Test Google Sheet:
https://docs.google.com/spreadsheets/d/1hLBFMWCTc2hpC-1C8Sxd5OVREdNHTVTtrJsAAU5Jl94/edit#gid=0
I've done this before for other sites I've worked at, but there appears to be no date in the meta data on Substack :/ (I could be wrong, as I'm no expert at reading XPATH)
I do see this in the body for the linked example:
<time datetime="2022-07-29T11:52:00.000Z">Jul 29</time>
I've been trying things like this (where E17 is where I put the article URL in Google Sheets) to no effect.
=REGEXEXTRACT(IMPORTXML(E17, "//time[#datetime='datePublished']/#content"), "(.+)T")
I've been mostly working off of this StackOverflow solution, but I haven't been able to apply the same finding to Substack's formatting.
If you want to grab it directly using a Google Sheets formula, this should work for you:
=ArrayFormula(IFERROR(VLOOKUP("*",FLATTEN(IFERROR(REGEXEXTRACT(IMPORTXML("https://www.theshortcut.com/p/ps5-restock","//div[2]"),"Swider(.?.?.?.?\d\d{1}[hrago\s]*)"))),1,FALSE),"???"))
To set realistic expectations, I usually can't invest this much time into working out such a solution on this forum. But I'm on vacation at the moment and filling time while my guest is otherwise occupied.
One further note: this is specific to the two sites you gave as examples. It will only work for sites where the second <div> holds this information and only where the data exists as strings exactly like those found on these two sites (including the poster's last name as "Swider").
ADDENDUM:
Looking at this further, did you try simply the following?
=IMPORTXML(C2, "//time")
(assuming your URL is in C2, etc.)
This seems to work for me, given that it appears the date/time data you want is contained within the first <time> element on the web page.

Google Analytics filters. Subsetting results to specific URLs with no parameters or question marks

I seem to not being lucky searching for an understandable anwer in this forum, so I decided to make my own question. I apoplogyze for any existing post that I could have missed.
Briefly, I want to know statistics from certain pages that I can address by setting a filter accounting the URL. The problem is that I can also found some visits that were made while administering the site (joomla) which show some queries.
I would like to get results from pages under, let's say, /index.php/certain_group/
(e.g.
/index.php/certain_group/this-page,
/index.php/certain/group/another-page)*
but not those like
/index.php/certain/group/another-page?view=form&layout=edit&a_id=89&return=aHR0cCUzQSUyR...bla bla
I have tried lost of combinations in http://www.analyticsmarket.com/freetools/regex-tester
I am being able to find only thos that I do not want, I mean, if I use "/index.php/group/.\?.$"
I get
/index.php/certain/group/another-page?view=form&layout=edit&a_id=89&return=aHR0cCUzQSUyR...bla bla
Any clue?
Thanks in advance

Display random single survey poll in side bar, show results, next poll?

We're developing an app in Rails, and want to randomly display a question to users that they haven't seen yet. Once they answer, it would show the results, and then ask if they want to answer the next question.
Has anyone done this? Is there perhaps some kind of gem that can help us, or do we have to write it from scratch?
Thanks in advance!
I've used the randumb gem for something similar. With it you can use scopes to chain your queries so you could fetch only an un_seen record (and subsequently update it to seen)
https://github.com/spilliton/randumb

Rails - extract seo keywords from block of text

I need to generate seo meta keyword tags based upon user generated wiki content.
Say I have an article and a predefined list of keywords/phrases, is there some good method to grab matched article keywords? Keywords may not be of one word length and will be given a predefined weight as to which keywords are used first. Some implementation of Nokogiri seems the obvious choice but I wondered if there were something more complete for this exact scenario.
You could process your text thanks to a semantic API, it will give you a list of potential keywords + the score associated.
I've begun to develop this gem: https://github.com/apneadiving/SemExtractor
It still needs some improvements for error handling but it's fully operational to query the following engines:
Zemanta
Semantic Hacker from Textwise
Yahoo Boss
OpenCalais
If you're only wanting to grab keywords for the meta keyword tag, that's not really worth your time. Google doesn't pay attention to those anymore.
Here's a good post about it, with a video of Matt Cutts from Google explaining that the meta keyword tag doesn't play a part in search engine rankings.
http://www.stepforth.com/blog/2010/meta-keyword-tag-dead-seo/
What is worth your time? Good title tags.

What is the number in a twitter status update URL?

My latest Twitter status update has the URL http://twitter.com/dinomite/status/1743967905 Does anyone know if there is any rhyme or reason behind the number 1743967905? It looks to me as though it might be a sequentially-assigned number for all Twitter users; I certainly haven't update 1.7b times, but all of Twitter might be around that. Anyone know?
According to the Twitter API docs it is the numerical ID of the status. I'd guess it's unique and sequential across all users, but I don't know for sure. If you need to know more take a look here for the official docs.
That would probably be the primary key in the status table. That URL format is used to look at a specific tweet.
Note that http://twitter.com/dinomite is the actual feed.
Also, based on how it seems to grow incrementally, this is probably an IDENTITY column or similar.
Yes, they're sequential.
One of the things it's helpful for is when writing a Twitter client, you can ask for anything newer than the last number you're already seen, so that you don't get and have to parse duplicates.

Resources