Using python to parse twitter url - twitter

I am using the following code but I am not able to extract any information from the url.
from urllib.parse import urlparse
if __name__ == "__main__":
z = 5
url = 'https://twitter.com/isro/status/1170331318132957184'
df = urlparse(url)
print(df)
ParseResult(scheme='https', netloc='twitter.com', path='/isro/status/1170331318132957184', params='', query='', fragment='')
I was hoping to extract the tweet message, time of tweet and other information available from the link but the code above clearly doesn't achieve that. How do I go about it from here ?
print(df)
ParseResult(scheme='https', netloc='twitter.com', path='/isro/status/1170331318132957184', params='', query='', fragment='')

I think you may be misunderstanding the purpose of the urllib parseurl function. From the Python documentation:
urllib.parse.urlparse(urlstring, scheme='', allow_fragments=True)
Parse a URL into six components, returning a 6-item named tuple. This
corresponds to the general structure of a URL:
scheme://netloc/path;parameters?query#fragment
From the result you are seeing in ParseResult, your code is working perfectly - it is breaking your URL up into the component parts.
It sounds as though you actually want to fetch the web content at that URL. In that case, I might take a look at urllib.request.urlopen instead.

Related

Trouble parsing CNN search results using Python 3 lxml

I am trying to parse the response from a search on the CNN site like so:
import requests
from lxml import html
from lxml import etree
r = requests.get('https://www.cnn.com/search?q=climate+change')
doc = etree.HTML(r.content)
for url in doc.xpath('//a[#href]'):
u = url.get('href')
print(u)
This gives a bunch of links, primarily to different sections on the site, but it gives no links at all to the actual stories returned by the search. What am I doing wrong?

Why are my images received as string? (ROS)

I already found my mistake. Should I delete this question?
I have a very very simple subscribe node. (Unfortunately searching the internet the usual examples use Strings, although a book of mine uses Ints)
The code is
import rospy
from sensor_msgs.msg import Image
from cv_bridge import CvBridge
import rosbag
def image_callback(msg):
#print(msg.data.header)
print(type(msg.data))
print(len(msg.data))
def image_recorder():
rospy.init_node('image_recorder', anonymous=True)
sub = rospy.Subscriber('image_results',Image, image_callback)
rospy.spin()
if __name__ == '__main__':
try:
image_recorder()
except rospy.ROSInterruptException:
pass
Now, what is the problem?
The output of this is:
<type 'str'>
1184260
Why? The messages that we are receiving are Images, (that is why I try to do msg.data.header and fail!)
How can I recover the images?
And no, I do not need to use CvBridge to convert them to opencv Images. I just need the ROS images
msg is of type Image
therefore msg.header is the correct way to write it

Could anyone scrape an element with Jsoup?

I'm trying to scrape this link using Jsoup with Kotlin/Java. And I have problem in scrapping players part (under Current Squad). Could anyone parse it?
You can not access the information directly using only the response from that link.
You can make a JSON object with the http response from https://stats.fn.sportradar.com/betsgi/en/America:Argentina:Buenos_Aires/gismo/stats_team_squad/2817 and https://stats.fn.sportradar.com/betsgi/en/America:Argentina:Buenos_Aires/gismo/stats_teamplayer_facts/2817/42556.
As an example in python you can get the minutes played by each player as follows:
import urllib
import json
f=urllib.urlopen('https://stats.fn.sportradar.com/betsgi/en/America:Argentina:Buenos_Aires/gismo/stats_team_squad/2817')
f2=urllib.urlopen('https://stats.fn.sportradar.com/betsgi/en/America:Argentina:Buenos_Aires/gismo/stats_teamplayer_facts/2817/42556')
j=json.loads(f.read())
j2=json.loads(f2.read())
plrs=j['doc'][0]['data']['players']
for plr in plrs:
print '========================='
print plr['name']
try:
print 'minutes played:' +str(j2['doc'][0]['data'][str(plr['_id'])]['stats']['total']['minutes_played'])
except KeyError, e:
pass

How to use HMAC in Lua - Lightroom plugin

First thing I have to mention is I'm really really new to Lua and please be patient if you think my question is too dumb
Here is my requirement
I need to use HMAC-sha256 for Lightroom plugin development as I'm using that for security.
I was trying to use this but with no luck
https://code.google.com/p/lua-files/wiki/hmac
These are the steps I followed
Got the code of
https://code.google.com/p/lua-files/source/browse/hmac.lua and saved
as 'hmac.lua' file in my plugin directory
Got the code from this
https://code.google.com/p/lua-files/source/browse/sha2.lua and saved
as 'sha2.lua' file
Now in the file I use it like this
local hmac = require'hmac'
local sha2 = require'sha2'
--somewhere doend the line inside a function
local hashvalue = hmac.sha2('key', 'message')
but unfortunately this does not work and I'm not sure what I'm doing wrong.
Can anyone advice me what I'm doing wrong here? Or is there an easier and better way of doing this with a good example.
EDIT:
I'm doing this to get the result. When I include that code the plugin does stops working. I cannot get the output string when I do this
hashvalue = hmac.sha2('key', 'message')
local LrLogger = import 'LrLogger'
myLogger = LrLogger('FlaggedFiles')
myLogger:enable("logfile")
myLogger:trace ("=========================================\n")
myLogger:trace ('Winter is coming, ' .. hashvalue)
myLogger:trace ("=========================================\n")
and the Lightroom refuses to load the plugin and there is nothing on the log as well
Thank you very much for your help
I'd first make sure your code works outside of Lightroom. It seems that HMAC module you referenced has some other dependencies: it requires "glue", "bit", and "ffi" modules. Of these, bit and ffi are binary modules and I'm not sure you will be able to load them into Lightroom (unless they are already available there). In any case, you probably won't be able to make it run in LR if you don't have required modules and can't make it run without issues outside of LR.
If you just need to get SHA256 hash there is a way to do it Lightroom
I posted my question here and was able to get an answer. But there there was no reference of this on SDK documentation (Lightroom SDK)
local sha = import 'LrDigest'
d = sha.SHA256.digest ("Hello world")
but unfortunately there was no HMAC so I decided to use md5 with a salt because this was taking too much of my time
Spent quite some time trying to find a solution :-/
LrDigest is not documented, thanks Adobe!
Solution:
local LrDigest = import "LrDigest"
LrDigest.HMAC.digest(string, 'SHA256', key)

Extracting an element from XML with Python3?

I am trying to write a Python 3 script where I am querying a web api and receiving an XML response. The response looks like this –
<?xml version="1.0" encoding="UTF-8"?>
<ipinfo>
<ip_address>4.2.2.2</ip_address>
<ip_type>Mapped</ip_type>
<anonymizer_status/>
<Network>
<organization>level 3 communications inc.</organization>
<OrganizationData>
<home>false</home>
<organization_type>Telecommunications</organization_type>
<naics_code>518219</naics_code>
<isic_code>J6311</isic_code>
</OrganizationData>
<carrier>level 3 communications</carrier>
<asn>3356</asn>
<connection_type>tx</connection_type>
<line_speed>high</line_speed>
<ip_routing_type>fixed</ip_routing_type>
<Domain>
<tld>net</tld>
<sld>bbnplanet</sld>
</Domain>
</Network>
<Location>
<continent>north america</continent>
<CountryData>
<country>united states</country>
<country_code>us</country_code>
<country_cf>99</country_cf>
</CountryData>
<region>southwest</region>
<StateData>
<state>california</state>
<state_code>ca</state_code>
<state_cf>88</state_cf>
</StateData>
<dma>803</dma>
<msa>31100</msa>
<CityData>
<city>san juan capistrano</city>
<postal_code>92675</postal_code>
<time_zone>-8</time_zone>
<area_code>949</area_code>
<city_cf>77</city_cf>
</CityData>
<latitude>33.499</latitude>
<longitude>-117.662</longitude>
</Location>
</ipinfo>
This is the code I have so far –
import urllib.request
import urllib.error
import sys
import xml.etree.ElementTree as etree
…
try:
xml = urllib.request.urlopen(targetURL, data=None)
except urllib.error.HTTPError as e:
print("HTTP error: " + str(e) + " URL: " + targetURL)
sys.exit()
tree = etree.parse(xml)
root = tree.getroot()
The API query works and through the debugger I can see all of the information inside the ‘root’ variable. My issue is that I have not been able to figure out how to extract something like the ASN (<asn></asn>) from the returned XML. I’ve been beating my head against this for a day with a whole wide variety of finds, findalls and all other sorts of methods but not been able to crack this. I think I have reached the point where I cannot see the wood for all the trees and every example I have found on the internet doesn’t seem to help. Can someone show me a code snippet which can extract the contents of a XML element from inside the tree structure?
Many thanks
Tim
I would recommend using Beautiful Soup.
It's a very powerful when it comes to extracting data from xml-code.
Example:
from bs4 import BeautifulSoup
soup = BeautifulSoup(targetURL)
soup.find_all('asn') #Would return all the <asn></asn> tags found!

Resources