PRAW allows extracting submissions on a given subreddit between two timestamps using this:
reddit.subreddit('news').submissions(startStamp, endStamp)
However, I haven't been able to find anything similar for extracting a given user's comments between two timestamps. Can this be done? I don't actually care about the 1000-requests limit unless the comments I get belong to the correct time range. I already had a look at their documentation here.
Although there is no argument for it like there is for the .submissionscall, you can do this manually with an if statement checking the created_utc against another utc timestamp. (You can use something like https://www.epochconverter.com/ to get a desired timestamp)
The following code sample gets all of /u/spez's comments from last christmas to this christmas.
import praw
oldest = 1482682380.0 #Timestamp for 12/25/16
newest = 1514218380.0 #Timestamp for 12/25/17
reddit = praw.Reddit('USER-AGENT-HERE')
for comment in reddit.redditor('spez').comments.new(limit= None):
if comment.created_utc > oldest and comment.created_utc < newest:
print "Comment Found! permalink: " + comment.permalink
Considering referring to Pushshift. You can get the comments by a user (let's say /u/avi8tr) at the following URL: Link .
There's a python wrapper (just like PRAW) for Pushshift as well, but it's under development: GitHub Link. You'll have to add the 'author' parameter in comment_search in psraw/endpoints.py, though.
Note: Both Pushshift and PSRAW are begin developed actively. So changes are expected.
Related
I have a reddit post link here:
https://www.reddit.com/r/dankmemes/comments/6m5k0o/teehee/
I wish to access the data of this post throught the redditkit API.
http://www.rubydoc.info/gems/redditkit/
I have tried countless times and the docs don't make too much sense to me. Can someone help show how to do this through the ruby console? Or even an implementation of this in rails would be really helpful!
Looking at the #comment method on the gem, it takes a comment_full_name and performs a GET to api/info.json with that parameter as an id (seen by viewing the source for that method). If we look at the reddit api for api/info the id parameter is a full name for the object with a link to what a full name is.
Following that link, a full name for a comment is
Fullnames start with the type prefix for the object's type, followed by the thing's unique ID in base 36.
and
type prefixes
t1_ Comment
So now we know the comment_full_name should be t1_#{comment's unique id} which appears to be 6m5k0o. Here, I'm unsure if that's already base36 or if they want you to convert that into base36 before passing it. Without seeing what all you've tried, I would say
client = RedditKit::Client.new 'username', 'password'
client.comment("t1_6m5k0o")
and if that doesn't work
client.comment("t1_#{'6m5k0o' base36 encoded}")
For questions like this, it would be nice to see some of your code and what you tried/results they gave. For all I know, you've already tried this and have a reason it didn't work for you.
I would test this out for you, but I don't have a reddit account for the gem to sign in with, this is just my guess glancing at the documentation.
I'm using PRAW to work with reddit submissions, specifically submissions that have been resolved and have their "flair" attribute set to SOLVED (as described here).
However, I am getting "None" when I check for flair, even for submissions that I can see have been set to SOLVED.
I have the following code, which works with a submission that has definitely been set to SOLVED.
solvedSubmission = reddit.submission(url='https://www.reddit.com/r/PhotoshopRequest/comments/6ctkpj/specific_can_someone_please_remove_kids_12467_i/')
pprint.pprint(vars(solvedSubmission))
This outputs:
{'_comments_by_id': {},
'_fetched': False,
'_flair': None,
'_info_params': {},
'_mod': None,
'_reddit': <praw.reddit.Reddit object at 0x10e3ae1d0>,
'comment_limit': 2048,
'comment_sort': 'best',
'id': '6ctkpj'}
Can anyone offer any insight as to why I'm seeing "None", on this post and other solved posts? Is there another way that reddit keeps track of solved posts that I should look into?
Thank you!
By now (~1y after OP) you might have solved this already, but it came up in a search I did, and since I figured out the answer, I will share.
The reason you never saw any relevant information is because PRAW uses lazy objects so that network requests to Reddit’s API are only issued when information is needed. You need to make it non-lazy in order to retrieve all of the available data. Below is a minimal working example:
import praw
import pprint
reddit = praw.Reddit() # potentially needs configuring, see docs
solved_url = 'https://www.reddit.com/r/PhotoshopRequest/comments/6ctkpj/specific_can_someone_please_remove_kids_12467_i/'
post = reddit.submission(url=solved_url)
print(post.title) # this will fetch the lazy submission-object...
pprint.pprint(vars(solvedSubmission)) # ... allowing you to list all available fields
In the pprint-output, you will discover, as of the time of writing this answer (Mar 2018), the following field:
...
'link_flair_text': 'SOLVED',
...
... which is what you will want to use in your code, e.g. like this:
is_solved = 'solved' == post.link_flair_text.strip().lower()
So, to wrap this up, you need to make PRAW issue a network request to fetch the submission object, to turn it from a lazy into a full instance; you can do this either by printing, assigning to a variable, or by going the direct route into the object by using the field-name you want.
Does any one know how to get the CC for any Youtube video that has the caption available? I know on the API 2.0 documentation mentions that it is only available for the owner of the video... but I was able to get some of the video's caption even though I'm not the owner of any....
There are two APIs (or links to API) can be used. they both rout to timpedtext API.
before I mention them we should note the parameters the API need. which are:
lang: {en, fr,...} required.
v: {video ID} required.
name: the track name, Required only if it is set. (and with this is my problem.)
tlang: translation to language. optional (should be set if you like to translate the CC to other language.
The API links are:
http://video.google.com/timedtext?lang=fr&v=PILzP-bIeLo&name=french
Note the above example would return nothing if you remove the name=French or set it to something else...
http://www.youtube.com/api/timedtext?v=zzfCVBSsvqA&lang=en
Note this example would return nothing if you set the name=...
http://www.youtube.com/api/timedtext?v=ZdP0KM49IVk&lang=en
yet the actual video has caption.
Example 3 does not return the CC data.
So I'm guessing that example 3 need to have the name parameter set. and my main problem is how do I find the name parameter if it is set or not. and if it is set how do I know what is it?
[update]: This was the preferred method until google recently discontinued it (writing as of dec 2021).
Your first example should work without the name= part.
This did the job for me:
video.google.com/timedtext?lang={languageID}&v={videoId}
To fetch the english CC version from the previous answer, it would look like this:
http://video.google.com/timedtext?lang=en&v=zzfCVBSsvqA
You can get the list of available captions with http://video.google.com/timedtext?type=list&v=zzfCVBSsvqA request.
Your 3rd video has only automatically generated captions, which you cannot fetch easily.
Here my suggestions after spending some time:
Js library: https://github.com/syzer/youtube-captions-scraper => support auto-generated caption.
2 quick methods below not support auto-generated caption
Get a list of subtitles: http://video.google.com/timedtext?type=list&v=lT3vGaOLWqE
Get subtitle with track id: http://video.google.com/timedtext?type=track&v=lT3vGaOLWqE&id=0&lang=en
Quick download:
http://downsub.com/?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dag_EJRhMfOM
If video.google.com does not fetch your closed caption file OR you don't want your file in XML format, but would rather SRT (see note below), try:
CC SUBS
NOTE: SRT can be transformed into virtually ANY format - either using free subtitling tools OR
by replacing \n\n with |, \n with ; and then | into \n, you get a CSV file that can be opened in a spreadsheet, for example.
I'm developing a sketchup plugin with ruby, I have coded the parsing process succesfully and I got the cpoints from csv file to sketchup. The csv file also contains a description within coordinates of every point like : ["15461.545", "845152.56", "5464.59", "tower1"].
I want to get the tower1 as a text associated to every point.
How can I do that ?
PS : You don't need to get the tower1 from the array, i've already done that. I have it now in an independant variable like :
desc_array = ["tower1", "beacon48", "anna55", ...]
Please help me
I did that by
##todraw_su.each do |todraw|
##desc_array.each do |desc|
#model.entities.add_text desc, todraw
end
end
But I found a problem with each statement, it loops for every todraw in desc
If you know how I can do it, answer on this question Each statement inside another each malfunctions?
I am using the Twitter Search API and I can't understand the id field of a tweet.
For example here is one: <id>tag:search.twitter.com,2005:1990561514</id>. The real ID is the final number part, right? Why doesn't Twitter already provide this in a single element? And, why is there a year of 2005on the ID field? Is that the ID of that year and the following year tweets get an ID recounted to zero? Is the ID indexed to the year?
I am asking all this stuff, because I am going to use the option of since_id to retrive new tweets. If the ID isn't really unique and depends on the year, it won't work as expected.
Thanks.
The tag is unique - but parts of it are redundant.
tag:search.twitter.com,2005:1990561514
Obviously, search.twitter.com is the URL from where you requested the document.
The ,2005 is constant. As far as I can tell, it has never changed since the service was launched. While there's no official documentation, I would guess that it refers to the ATOM specification namespace - http://www.w3.org/2005/Atom"
Finally, the long number is the Tweet's status ID. It will always be unique and can be used for the since_id.
What you will need to do is split the string, and just use the number after the colon as your ID.
I believe you are doing something wrong. If you look at all of the example results from the Twitter Search API, none of the id fields are formatted like this one you are showing.
For example:
http://search.twitter.com/search.json?q=%40twitterapi%20-via
Also, if you check out the example requests page, you will see that all of the id fields have normal formats, i.e.:
"id":122032448266698752
Update:
Now that I know you are using the atom feed, I can see where the seemingly oddly formatted element comes from. See this article on avoiding duplicates in atom feeds. Another helpful article.
Basically, atom feeds REQUIRE a unique id for each element in a feed. Some feeds use the "tag" scheme to ensure uniqueness. This format is actually pretty common in atom feeds and many frameworks use it by default. For instance, the RoR AtomFeedHelper (which might even be what Twitter uses) specifies the default format to be:
"tag:#{request.host},#{options}:#{request.fullpath.split(".")}"