from bs4 import BeautifulSoup
import requests
import html5lib
url = 'https://twitter.com/st3phensparkman'
result = requests.get(url)
doc = BeautifulSoup(result.text, 'html5lib')
followers = doc.find_all(text='Followers')
print(followers)
For some insight - I've been coding more and more recently (new to the game), however I've become stumped when dealing with web scraping. After countless tries, I managed to build one - now though, I'm aiming to create my own (without having to copy a YouTubers code). My project will aim to find how many followers my friends and myself have on twitter.
I've been using BeautifulSoup, requests, and of course, the built in html parser.
I didn't get too far before the problem arises. When I'm trying to locate the first tag/string. The program runs successfully, however, it only returns empty brackets.
Searching for answers, I've found that it could be the parser, people online have said that it's not built for all html doc's that are used by big websites.
A substitute parser, html5lib, is supposed to work. Upon downloading it though, my program runs the error that it can't find that module!
A solution to either of these problems should set me on the right track. Is there a way to make it return the true value (not empty brackets)? Or should I use html5lib, and if so, why can't my computer find the module?
FYI - I'm running the program inside a venv. While I suspect this is the issue using html5lib, by checking in the console, I've concluded that it is indeed downloaded and updated
I tried using html5lib in order to make the tags/strings appear in brackets
Created a new problem; "Cannot find module"
P.S I can't include screenshots, so I pasted the html5lib code to the top. My other bit of code is the exact same, except it doesn't contain "import html5lib" and I replaced the parser accordingly
Related
I would like to build a IDE in the browser. Basically a WebApp where you choose which language you want to write code in, then you submit it to an endpoint, and get back the output.
I could not find any docker images that existed and I tried to use Judge but it seems like it is going to cause issues and there is no customer support.
Has anyone built something like this before?
I'm working on automated test using Appium with Robot framework on Android device. I create schedule run on Jenkins. My test flow is entering some data in page A and submit, then switch to page B to check the result and switch to page A to enter a new data. I repeat this loop for around 10+ time. Everything works fine in around 4-5 rounds but after that there show up an error :
StaleElementReferenceException: Message: Cached element 'By.xpath:
//android.widget.TextView[#text='Limit']' do not exists in DOM anymore
The TextView is in the page A. I monitored the robot and saw that the TextView was shown up but the robot did not see it. I tried restart the device but the problem is not solved. I search through the internet and found some who facing the same issue but they use different programming language like Java or Python. I have no idea what I have to do next.
Development Tools :
Appium version: 1.21.0
Robot Framework version: 4.1.2 (Python 3.10.0 on win32)
First I do not use Robot Framework, but the code should be similar according to this https://robocorp.com/docs/languages-and-frameworks/robot-framework/try-except-finally-exception-catching-and-handling.
Second, I'm not sure if this is the best way to get around this. I think there is something you can do with the expected conditions class to get around this in a "cleaner way" but I'm not quite familiar with it enough to show/tell you. Instead what I've done is something like this...
from selenium.common.exceptions import StaleElementReferenceException
while some_limiting_factor:
try:
# logic for submitting page A, assertions for page B
except StaleElementReferenceException:
element = driver.find_element('By.xpath: //android.widget.TextView[#text='Limit']' )
As much as I want to cache elements in appium, it seems that the service itself does NOT want you to, at least not in my experience. Getting a fresh element(s) every time seems to ensure a "slow but steady" test. Hopefully someone can show me the deep appium secrets one day.
I am trying to use Swagger UI to document our node.js API, so I went to http://swagger.io/docs/, down to Swagger UI Documentation -> Usage, to find this
Now, this is not the only place that provides these instructions, there are dozens of blogs & tutorials saying the same thing, so that's exactly what I did.
Cloned the repo, went into /dist/ and ran the /dist/index.html and all I get is an empty page with an error:
I'm slowly going crazy now as I can't find anything about it and literally every place I looked just has the same, copied, instructions with nothing else provided (like what could go wrong? you just open a file...)
Any help or explanations are much appreciated!
P.S. for some reason opening the /public/index.html works (mentioned nowhere on the www)
I think this is bug in new version of swagger-UI. This is fresh release and they are still modifying and fixing bugs.
Look here: Swagger-ui cannot access JS scripts. This seems to be similar problem, maybe it will help you.
I'm using ElFinder.NET connector in my MVC applications. In one application everything works fine, but the other application can't initialize Elfinder.
The code used to initialize Elfinder is the same in both applications. The problem is probably in Connector.Process(this.HttpContext.Request) call.
In the application, where ElFinder is working, Connector.Process returns JSON result with correct data, the other application returns a wierd result.
I can see in the browser, that the request was processed, but the response body contains string System.Web.Mvc.JsonDataContractResult instead of JSON data. If I step through the code in Visual Studio, I see that Connector.Process return a JsonDataContractResult but it's empty.
Well then :)
Possible situation is; outdated Json.Net(Newtonsotf.Json) package version. If you have an older version of this library please simply replace them with the ones in elfinder's package as i say on my comment..
We have to maintain a lot of classic ASP and VB/ASP.Net applications that link to many different parts of a static website.
The master pages are littered with various
<!-- #include virtual="/site/footer.something" -->
and similar, where there are many many combinations of what /site/ can be.
The problem is, when you're debugging etc. when you try to run one of these sites locally, you're almost guaranteed to get a parser error.
What I want to do is come up with a generic handler so that I can just insert a blank file for any #include file that doesn't exist.
I tried to setup a URL Rewrite rule, which works in the browser (just redirects to an empty html file) but I'm guessing the ASP parser doesn't include as a webrequest as it still generates a parser error.
I don't want to have to copy the static content to my workstation everytime I open a new app and I don't want to edit the master pages to exclude the links as one day I'm just going to forget and deploy something broken.
So the question is, is there a way to serve a default file for these declarations, or some other method ?
Edit: To consider a different fix to this problem; is there some way to insert some kind of file-system handler that can pick up requests for missing files in specific locations and return predefined content ?
Yeah, I know that's a really offbeat direction and probably a very bad idea in practise, but this is quite a frustrating problem in the office now.
What's irritating is that even though IIS has SSI disabled, the ASP processer still honours #include directives. Is there a way to either disable that, or perhaps some way to override the behaviour in some kind of generated class ?
The problem you will encounter is that includes are processed before any of your code runs. The server gathers all of the resources referenced in the scripts then compiles and runs your code. By the time your code is running, the missing include has already thrown a compiler error.
Further, what you're asking could potentially run into other problems. Often times includes contain code (procedures, constants, variable declarations, etc.) that other scripts rely on. So, even if you were to replace the missing include with an empty file, you still may encounter other parser errors if the including script expects that include to contain specific code.
Probably your best bet is to make a console app or something similar that parses your files looking for the include statements, resolves the relative path based on your directory structure and does what you want - write an empty file if it doesn't exist. You could then run your projects through this parser and at least eliminate that issue.
Additionally, you mention the possibility of accidentally deploying something that you've edited to circumvent this problem. I would assume then, that if you were to write out these "dummy" includes, there is no possibility of you accidentally deploying them and overwriting good files?