Reddit api return the content of a comment or a self.text - reddit

After looking at the documentation I still can't understand how it's all tied up. What I am trying to accomplish is simple: given an url, return the text contents of that url.
For example:
import praw
r = praw.Reddit(user_agent='my_cool_app')
post = "http://www.reddit.com/r/askscience/comments/10kp2h\
/lots_of_people_dont_feel_identified_or_find/"
comment = "http://www.reddit.com/r/askscience/comments/10kp2h\
/lots_of_people_dont_feel_identified_or_find/c6ec6hf"
Establishing which is a comment and which is a post can be done using regex but if there's a better way I will use that.
So my question is: what is the best way to determine the nature of a reddit url and how do I get the contents of that url?
What I tried so far:
post=praw.objects.Submission.get_info(r, url).selftext
#returns the self.text of a post regardless if that url is a permalink to a comment
comment_text = praw.objects.?????() # how to do this ?
Thanks in advance.

import praw
r = praw.Reddit('<USERAGENT>')
comment_url = ('http://www.reddit.com/r/askscience/comments/10kp2h'
'/lots_of_people_dont_feel_identified_or_find/c6ec6hf')
comment = r.get_submission(comment_url).comments[0]
print comment.body
My responses here and here should provide additional useful information related to your question.

Related

Text string using Biopython

I'm using Biopython in my code and i need to extract the abstract out of articles. For searching the article I'm using the function:
def search(query):
Entrez.email = 'your.email#example.com'
handle = Entrez.esearch(db='pubmed',
sort='relevance',
retmax='20',
retmode='xml',
term=query)
results = Entrez.read(handle)
return results
I'm looking for the simpliest way to get the text as a string after searching the article (I'm aiming just for one result in a search using the pmid).
cheers
Try use metapub:
from metapub import PubMedFetcher
fetch = PubMedFetcher()
article = fetch.article_by_pmid('31326596')
article.abstract

Link encryption?

I have been stuck on a problem for a few hours. Nothing online has helped and I'm losing the will to live right now.
The site loads up a question with no hints and asks you to find a secret code.
Here's the brief explanation of it:
'Well done on making it to the secret bonus challenge! Our agents have been struggling to deal with a hacker obsessed with clocks and timing. He set up an elaborate collection of pages with content that changes based on a timer. We've replicated it below, can you figure out how to get the secret code?'
There are many links inside this challenge and when they are clicked it opens to a new website and has pseudo strings in there, I don't see much pattern. Links below:
https://assess.joincyberdiscovery.com/challenge-files/clock-pt1?verify=BY%2F8lhw%2BtbBgvOMDiHeB5A%3D%3D
https://assess.joincyberdiscovery.com/challenge-files/clock-pt2?verify=BY%2F8lhw%2BtbBgvOMDiHeB5A%3D%3D
https://assess.joincyberdiscovery.com/challenge-files/clock-pt3?
verify=BY%2F8lhw%2BtbBgvOMDiHeB5A%3D%3D
https://assess.joincyberdiscovery.com/challenge-files/clock-pt4?
verify=BY%2F8lhw%2BtbBgvOMDiHeB5A%3D%3D
https://assess.joincyberdiscovery.com/challenge-files/clock-pt5?verify=BY%2F8lhw%2BtbBgvOMDiHeB5A%3D%3D
(If it doesn't allow you to go on) then what it has is just a tag and no element with what it seems a three character code which always ends in 'a' for example 'Aja' and makes a new one every 10 seconds (which is not re-generated client side.)
Anyone have any suggestions to whether or not the link is a hint of encryption or not? I've decrypted it once and it came up with:
'https://assess.joincyberdiscovery.com/challenge-files/clock-pt5?verify=BY/8lhw tbBgvOMDiHeB5A==' which isn't much help.
Anyways, anyone have any suggestions?
Thanks :)
Its not impossible. I have the answer here:
import requests
page1 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt1?verify=wMHfxKSix2qSPJtLe6U98w%3D%3D"
page1_content = requests.get(page1)
page1txt = page1_content.text
page2 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt2?verify=wMHfxKSix2qSPJtLe6U98w%3D%3D"
page2_content = requests.get(page2)
page2txt = page2_content.text
page3 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt3?verify=wMHfxKSix2qSPJtLe6U98w%3D%3D"
page3_content = requests.get(page3)
page3txt = page3_content.text
page4 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt4?verify=wMHfxKSix2qSPJtLe6U98w%3D%3D"
page4_content = requests.get(page4)
page4txt = page4_content.text
page5 = "https://assess.joincyberdiscovery.com/challenge-files/clock-pt5?verify=wMHfxKSix2qSPJtLe6U98w%3D%3D"
page5_content = requests.get(page5)
page5txt = page5_content.text
code = (page1txt + page2txt + page3txt + page4txt + page5txt)
page6 = "https://assess.joincyberdiscovery.com/challenge-files/get-flag?verify=wMHfxKSix2qSPJtLe6U98w%3D%3D&string="+code
page6txt = requests.get(page6)
print (page6txt.text)
Replace all of the links with the links you are given

How to compile custom format ini file with redirects?

I'm working with an application that has 3 ini files in a somewhat irritating custom format. I'm trying to compile these into a 'standard' ini file.
I'm hoping for some inspiration in the form of pseudocode to help me code some sort of 'compiler'.
Here's an example of one of these ini files. The less than/greater than indicates a redirect to another section in the file. These redirects could be recursive.. i.e. one redirect then redirects to another. It could also mean a redirect to an external file (3 values are present in that case). Comments start with a # symbol
[PrimaryServer]
name = DEMO1
baseUrl = http://demo1.awesome.com
[SecondaryServer]
name = DEMO2
baseUrl = http://demo2.awesome.com
[LoginUrl]
# This is a standard redirect
baseLoginUrl = <PrimaryServer:baseUrl>
# This is a redirect appended with extra information
fullLoginUrl = <PrimaryServer:baseUrl>/login.php
# Here's a redirect that points to another redirect
enableSSL = <SSLConfiguration:enableSSL>
# This is a key that has mutliple comma-separated values, some of which are redirects.
serverNames = <PrimaryServer:name>,<SecondaryServer:name>,AdditionalRandomServerName
# This one is particularly nasty. It's a redirect to another file...
authenticationMechanism = <Authenication.ini:Mechanisms:PrimaryMechanism>
[SSLConfiguration]
enableSSL = <SSLCertificates:isCertificateInstalled>
[SSLCertificates]
isCertificateInstalled = true
Here's an example of what I'm trying to achieve. I've removed the comments for readability.
[PrimaryServer]
name = DEMO1
baseUrl = http://demo1.awesome.com
[SecondaryServer]
name = DEMO2
baseUrl = http://demo2.awesome.com
[LoginUrl]
baseLoginUrl = http://demo1.awesome.com
fullLoginUrl = http://demo1.awesome.com/login.php
enableSSL = true
serverNames = DEMO1,DEMO2,AdditionalRandomServerName
authenticationMechanism = valueFromExternalFile
[SSLConfiguration]
enableSSL = <SSLCertificates:isCertificateInstalled>
[SSLCertificates]
isCertificateInstalled = true
I'm looking at using ini4j (Java) to achieve this, but am by no means fixed on using that language.
My main questions are:
1) How can I handle the recursive redirects
2) How am I best to handle the redirects that have an additional string, e.g. serverNames
3) Bonus points for any suggestions about how to handle the external redirects. No big deal if that part isn't working just yet.
So far, I'm able to parse and tidy up the file, but I'm struggling with these redirects.
Once again, I'm only hoping for pseudocode. Perhaps I need more coffee, but I'm really puzzled by this one.
Thanks in advance for any suggestions.

Which is the syntax of SESSION ID

I'm making a web crawler with python and I sometimes find ":jsessionid=XXXX" in urls. I have made a function to delete it. My function takes an url and deletes from it the pattern ";jsession=XXXX...", where "XXXX..." is a pattern that matches anything till a question mark. I'm not sure if the algorithm is correct, because I don't get the syntax of jsessionid="...".
Anyway, my function is the following, could you please tell me if it's correct or where I can find the syntax of SESSION ID?
def deleteJSessionid(link):
print("originalLink:",link)
p = re.compile(r';jsessionid=[^?]*',re.DOTALL | re.IGNORECASE)
p = p.search(link)
print("\n\n"+p.group()+"\n\n")
start = p.span()[0]
end = p.span()[1]
link = link[:start] + link[end:]
return link

Google App script: Stumped on command to extract 'title' from forum HTML page & paste into a spreadsheet (my code inside)

I'm Extremely new to this and I've been trying to get the title of each unique forum page (or topic) here is the code I have so far:
function GraalGet() {
//parses forums for ALL posts one by one, extract <title> from HTML webpage
var sheet = SpreadsheetApp.getActiveSheet();
var i = 31
var url = "http://www.graalians.com/forums/showthread.php?p="+i;
//var params = {method : "post"}; can this be used at all?
//The aim: loop this once you can get 1 result.
var geturl = UrlFetchApp.fetch(url).getContentText(); //maybe .getContentText should be elsewhere?
var parseurl = Xml.parse(geturl, true); //confirmed - this is true because it wont parse HTML if false
var titleinfo = parseurl.getElement().getElement("html"); //.getElement('body');//.getElements("title");
sheet.appendRow([titleinfo, i]);
}
In addition the script would write down the topic number in the adjoining cell.
There's a lot of answered questions about extracting XML data, and this example is about parsing HTML but I couldn't pull up any results - I'm honestly stumped and any help about finding and extracting the tag will be appreciated. (If you have the time, please feel free to explain as well, but I'll be thankful for any help really.)
For reference I have used these:
Google's Kevin Bacon Script
The authors comments on bugs with the script & some explanation
I'm sorry if I'm being pedantic, this is my first post & I don't want to anger anyone, please do tell me if I've broken any rules, I'll do my best to fix them. I've left the comments I made for myself for your perusal too.
You can use Logger.log to print out debugging information. I did this with your function and figured out that the title tag is embedded within the tag. So you should use something like this. Also, getElement returns an XmlElement object which you should convert to String using getText().
var titleinfo = parseurl.getElement().getElement('head').getElement('title');
sheet.appendRow([titleinfo.getText(), i]);

Resources