Could anyone scrape an element with Jsoup? - parsing

I'm trying to scrape this link using Jsoup with Kotlin/Java. And I have problem in scrapping players part (under Current Squad). Could anyone parse it?

You can not access the information directly using only the response from that link.
You can make a JSON object with the http response from https://stats.fn.sportradar.com/betsgi/en/America:Argentina:Buenos_Aires/gismo/stats_team_squad/2817 and https://stats.fn.sportradar.com/betsgi/en/America:Argentina:Buenos_Aires/gismo/stats_teamplayer_facts/2817/42556.
As an example in python you can get the minutes played by each player as follows:
import urllib
import json
f=urllib.urlopen('https://stats.fn.sportradar.com/betsgi/en/America:Argentina:Buenos_Aires/gismo/stats_team_squad/2817')
f2=urllib.urlopen('https://stats.fn.sportradar.com/betsgi/en/America:Argentina:Buenos_Aires/gismo/stats_teamplayer_facts/2817/42556')
j=json.loads(f.read())
j2=json.loads(f2.read())
plrs=j['doc'][0]['data']['players']
for plr in plrs:
print '========================='
print plr['name']
try:
print 'minutes played:' +str(j2['doc'][0]['data'][str(plr['_id'])]['stats']['total']['minutes_played'])
except KeyError, e:
pass

Related

Post-function custom lua code to manipulate JSON response body

I’m trying to write a custom plugin to transform response body. I could’ve used a response transformer plugin, but my response body json is complex, so I want to remove few fields from it.
I tried using post-function plugin to write my custom lua code but it doesn’t let me import cjson, so I’m unable to decode the response and remove specific keys from it.
My lua code in body_filter:
local cjson = require(“cjson”)
local body = cjson.decode(kong.response.get_raw_body())
-- set custom key’s value to 1
body.subKeyFoo.subSubKey = 1;
This is what I get:
require cjson not allowed within sandbox “kong”
The sandbox is enabled, this is to protect arbitrary Lua code from doing dangerous things. See the docs on how to disable the sandbox. Link: https://docs.konghq.com/gateway/latest/reference/configuration/#untrusted_lua check untrusted_lua_xxx options (3 in total)

Using python to parse twitter url

I am using the following code but I am not able to extract any information from the url.
from urllib.parse import urlparse
if __name__ == "__main__":
z = 5
url = 'https://twitter.com/isro/status/1170331318132957184'
df = urlparse(url)
print(df)
ParseResult(scheme='https', netloc='twitter.com', path='/isro/status/1170331318132957184', params='', query='', fragment='')
I was hoping to extract the tweet message, time of tweet and other information available from the link but the code above clearly doesn't achieve that. How do I go about it from here ?
print(df)
ParseResult(scheme='https', netloc='twitter.com', path='/isro/status/1170331318132957184', params='', query='', fragment='')
I think you may be misunderstanding the purpose of the urllib parseurl function. From the Python documentation:
urllib.parse.urlparse(urlstring, scheme='', allow_fragments=True)
Parse a URL into six components, returning a 6-item named tuple. This
corresponds to the general structure of a URL:
scheme://netloc/path;parameters?query#fragment
From the result you are seeing in ParseResult, your code is working perfectly - it is breaking your URL up into the component parts.
It sounds as though you actually want to fetch the web content at that URL. In that case, I might take a look at urllib.request.urlopen instead.

how can store parse json string data into locally textfile in phonegap ios?

how store parse json string data into localltext file first and next time fetching data from local file it self this my code which i write
$.getJSON("my ruls ",function(data)
the above line am parseing json url and i need store that data into local text file then user second time read data from local text file only not json url ?
pls give some idea or suggestion or some links to achive this problem
i did this problem in ipohne
thanks & regards
you can write your data using FileWriter in Phonegap (http://docs.phonegap.com/en/2.5.0/cordova_file_file.md.html#FileWriter)
regards,

Extracting an element from XML with Python3?

I am trying to write a Python 3 script where I am querying a web api and receiving an XML response. The response looks like this –
<?xml version="1.0" encoding="UTF-8"?>
<ipinfo>
<ip_address>4.2.2.2</ip_address>
<ip_type>Mapped</ip_type>
<anonymizer_status/>
<Network>
<organization>level 3 communications inc.</organization>
<OrganizationData>
<home>false</home>
<organization_type>Telecommunications</organization_type>
<naics_code>518219</naics_code>
<isic_code>J6311</isic_code>
</OrganizationData>
<carrier>level 3 communications</carrier>
<asn>3356</asn>
<connection_type>tx</connection_type>
<line_speed>high</line_speed>
<ip_routing_type>fixed</ip_routing_type>
<Domain>
<tld>net</tld>
<sld>bbnplanet</sld>
</Domain>
</Network>
<Location>
<continent>north america</continent>
<CountryData>
<country>united states</country>
<country_code>us</country_code>
<country_cf>99</country_cf>
</CountryData>
<region>southwest</region>
<StateData>
<state>california</state>
<state_code>ca</state_code>
<state_cf>88</state_cf>
</StateData>
<dma>803</dma>
<msa>31100</msa>
<CityData>
<city>san juan capistrano</city>
<postal_code>92675</postal_code>
<time_zone>-8</time_zone>
<area_code>949</area_code>
<city_cf>77</city_cf>
</CityData>
<latitude>33.499</latitude>
<longitude>-117.662</longitude>
</Location>
</ipinfo>
This is the code I have so far –
import urllib.request
import urllib.error
import sys
import xml.etree.ElementTree as etree
…
try:
xml = urllib.request.urlopen(targetURL, data=None)
except urllib.error.HTTPError as e:
print("HTTP error: " + str(e) + " URL: " + targetURL)
sys.exit()
tree = etree.parse(xml)
root = tree.getroot()
The API query works and through the debugger I can see all of the information inside the ‘root’ variable. My issue is that I have not been able to figure out how to extract something like the ASN (<asn></asn>) from the returned XML. I’ve been beating my head against this for a day with a whole wide variety of finds, findalls and all other sorts of methods but not been able to crack this. I think I have reached the point where I cannot see the wood for all the trees and every example I have found on the internet doesn’t seem to help. Can someone show me a code snippet which can extract the contents of a XML element from inside the tree structure?
Many thanks
Tim
I would recommend using Beautiful Soup.
It's a very powerful when it comes to extracting data from xml-code.
Example:
from bs4 import BeautifulSoup
soup = BeautifulSoup(targetURL)
soup.find_all('asn') #Would return all the <asn></asn> tags found!

Opening and decompressing an XML URL in Rails

I'm building a rails app that takes information about products from an XML datafeed hosted on a 3rd party server. This XML is sent gzipped, and I'm having serious difficulty in getting anywhere with it.
I've spent a fair bit of time with Google on this, but the results of my searching seem to be more about Sending Gzipped output rather than receiving a Gzipped input.
The closed I've come to a solution came from StackOverflow, but I'm still getting errors.
What I'm trying to do in the first instance is print the XML data to the browser, then I can start with the processing of it. Here's my current code:
def load_data
url = "http://xml.domain.com/datafeed/"
xml_input = Net::HTTP.get(URI.parse(url))
zstream = Zlib::Inflate.new
#xml_output = zstream.inflate(xml_input)
zstream.finish
zstream.close
end
The error I'm getting from it is:
Zlib::DataError in Cron/get datafeedController#load_data
incorrect header check
I guess this means that the data isn't in the format that is expected, but I can't find information about how to do this properly anywhere. Two things I've ruled out is that the URL is valid and the response is Gzipped, but I'm stuck with how to get past this.
Any help would be greatly appreciated :-)
Sorted!
file = Net::HTTP.get(URI.parse(url))
gz = Zlib::GzipReader.new(StringIO.new(file))
whole_xml = gz.read
Then to load into Hpricot to do the XML parsing:
hp = Hpricot(whole_xml)

Resources