using lxml with beautiful soup

using lxml with beautiful soup - html-parsing

I'm having trouble making lxml work with beautiful soup. Running on osx 10.8.4. To install lxml, i did port install py25-lxml and it installed fine. Now I'm getting this error when I try to use lxml with Beautiful Soup:
Traceback (most recent call last):
File "********.py", line 13, in <module>
soup = BeautifulSoup(urllib2.urlopen(url).read(), 'lxml')
File "/Users/********/********/bs4/__init__.py", line 155, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml.
Do you need to install a parser library?
Not sure if this is part of the problem, but I'm unable to import lxml.etree
Has anyone else gotten lxml to work with beautiful soup on osx?
Also, maybe I could just try to use a different html parser. Does anyone have suggestions for other parsers?

From the lxml website:
If this fails attempt to build it yourself
http://lxml.de/build.html#building-lxml-on-macos-x
This may not work so don't rely on it
Otherwise there are other parsers such as lxml.html (should work with lxml) and a few others that I'm not sure of.

Related

i´m trying to compile a pdf in mardown, but generate the next error

i´m working on rmardown, i´m trying to compile in pdf, but the console show me an error.
! Package inputenc Error: Unicode character ^^S (U+0013)
(inputenc) not set up for use with LaTeX.
Try other LaTeX engines instead (e.g., xelatex) if you are using pdflatex. See https://bookdown.org/yihui/rmarkdown-cookbook/latex-unicode.html
Error: LaTeX failed to compile Modelos-de-volatilidad.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See Modelos-de-volatilidad.log for more info.
Ejecución interrumpida
i have install MIKTEX.

Check for the error in the line where it is happening. There's surely one character that LaTeX is not processing. Comment or cut the line and run again. I have had a similar issue when copy pasting from a pdf into LaTeX.

Import Pcaml grammar to extend OCaml's printer using camlp5

I want to create a printer extension for OCaml using camlp5. My code would look like the example of this tutorial but instead of creating my own extension of the grammar, I would like to use OCaml's grammar to parse a program.
For that, I would like to use the Pcaml module to parse the given string with OCaml's grammar. Unfortunately, each time I try to use it, I get the:
Required module 'Pcaml' is unavailable
This is the part of my code where I load and open modules, as well as part of the code that uses Pcaml:
#load "pa_extprint.cmo";;
#load "q_MLast.cmo";;
#load "pa_o.cmo";;
open Pcaml;;
open Pprintf;;
let pa_ocaml = Grammar.Entry.create Pcaml.gram "pcaml_gram";;
I tried multiple command to run the program, like for example:
ocamlc -pp camlp5o -I +camlp5 gramlib.cma <my_file>.ml
What do I need to be able to use Pcaml and Pcaml.gram?

I recommend to use ocamlfind to build and link your programs. The only reason for newcomer against it, is that thing could become buggy when you use Windows without WSL. The compilation command without error is below
ocamlfind c -syntax camlp5o -package camlp5 -linkpkg a.ml
#load "pa_extprint.cmo";;
#load "q_MLast.cmo";;
#load "pa_o.cmo";;
open Pcaml;;
open Pprintf;;
let pa_ocaml : int Grammar.Entry.e = Grammar.Entry.create Pcaml.gram "pcaml_gram";;
FYI, your #load commands can and should be replaced by specifying right ocamlfind's packages.

Which compression types support chunking in dask?

When processing a large single file, it can be broken up as so:
import dask.bag as db
my_file = db.read_text('filename', blocksize=int(1e7))
This works great, but the files I'm working with have a high level of redundancy and so we keep them compressed. Passing in compressed gzip files gives an error that seeking in gzip isn't supported and so it can't be read in blocks.
The documentation here http://dask.pydata.org/en/latest/bytes.html#compression suggests that some formats support random access.
The relevant internal code I think is here:
https://github.com/dask/dask/blob/master/dask/bytes/compression.py#L47
It looks like lzma might support it, but it's been commented out.
Adding lzma into the seekable_files dict like in the commented out code:
from dask.bytes.compression import seekable_files
import lzmaffi
seekable_files['xz'] = lzmaffi.LZMAFile
data = db.read_text('myfile.jsonl.lzma', blocksize=int(1e7), compression='xz')
Throws the following error:
Traceback (most recent call last):
File "example.py", line 8, in <module>
data = bag.read_text('myfile.jsonl.lzma', blocksize=int(1e7), compression='xz')
File "condadir/lib/python3.5/site-packages/dask/bag/text.py", line 80, in read_text
**(storage_options or {}))
File "condadir/lib/python3.5/site-packages/dask/bytes/core.py", line 162, in read_bytes
size = fs.logical_size(path, compression)
File "condadir/lib/python3.5/site-packages/dask/bytes/core.py", line 500, in logical_size
g.seek(0, 2)
io.UnsupportedOperation: seek
I assume that the functions at the bottom of the file (get_xz_blocks) for example can be used for this, but don't seem to be in use anywhere in the dask project.
Are there compression libraries that do support this seeking and chunking? If so, how can they be added?

Yes, you are right that the xz format can be useful to you. The confusion is, that the file may be block-formatted, but the standard implementation lzmaffi.LZMAFile (or lzma) does not make use of this blocking. Note that block-formatting is only optional for zx files, e.g., by using --block-size=size with xz-utils.
The function compression.get_xz_blocks will give you the set of blocks in a file by reading the header only, rather than the whole file, and you could use this in combination with delayed, essentially repeating some of the logic in read_text. We have not put in the time to make this seamless; the same pattern could be used to write blocked xz files too.

Error while generating make latexpdf with sphinx

I am currently using sphinx 1.4.9 for documents creation. While giving make latexpdf, I get the following error.
(/usr/share/texmf/tex/latex/upquote/upquote.sty)
(/usr/share/texmf/tex/latex/float/float.sty)
(/usr/share/texmf/tex/latex/graphics/graphicx.sty
(/usr/share/texmf/tex/latex/graphics/graphics.sty
(/usr/share/texmf/tex/latex/graphics/trig.sty)
(/usr/lib/texmf/tex/latex/config/graphics.cfg)))
(/usr/share/texmf/tex/plain/misc/pdfcolor.tex)
(/usr/share/texmf/tex/latex/hyperref/hyperref.sty
(/usr/share/texmf/tex/latex/hyperref/pd1enc.def)
(/usr/lib/texmf/tex/latex/config/hyperref.cfg)
(/usr/share/texmf/tex/latex/oberdiek/kvoptions.sty)
****! Package keyval Error: pdfencoding undefined.****
See the keyval package documentation for explanation.
I have searched for pdfencoding in the sphinx.egg, also it only appears in sphinx.sty.
but I don't know how to define this pdfencoding. whether to edit conf.py or how to do.

Your hyperref is outdated. Sphinx 1.4.x series was tested to work with Ubuntu Precise (Debian/TeXLive 2009). I can not try it but here is a hack which may help you out. However, perhaps other parts will fail as your TeX install is really old...
put
'passoptionstopackages' : """
\\let\\originalPassOptionsToPackage\\PassOptionsToPackage
\\makeatletter
\\def\\PassOptionsToPackage#1{%
\\def\\#tempa{#1}\\def\\#tempb{pdfencoding=unicode}%
\\ifx\\#tempa\\#tempb\\expandafter\\#gobbletwo
\\else\\expandafter\\originalPassOptionsToPackage\\fi {#1}}
\\makeatother
""",
inside the latex_elements configuration variable of conf.py. Could work ...

Clang_complete not worrking

unfortunately I can't manage to make clang_complete work and I could need your help.
I've already compiled vim 7.4 with python support. Here is the output of vim --version | grep python:
+cryptv +linebreak +python/dyn +viminfo
-cscope +lispindent +python3/dyn +vreplace
I followed this guide: https://vtluug.org/wiki/Clang_Complete
Please note that I've started from a clean installation (i.e. no other plugins and no further entries in my .vimrc (except for those shown in the guide above)).
According to the tutorials I've seen so far everything should be working.
However, if I try to get code completion for the following example nothing happens. If I press <c-x><x-u> I receive the error "completefunc not set".
#include <string>
int main()
{
std::string s;
s.
}
Moreover, I've installed the newest version of clang from source and it in my $PATH.
Is there a way to verify that clang_complete is actually installed?
What might cause this problem?
Any help is much appreciated.

Add
filetype plugin indent on
to your vimrc, its missing from the vimrc snippet in the link. This tells vim to do filetype detection and fire autocommands related to those file types. Without it you won't run the following autocommands.
au FileType c,cpp,objc,objcpp call <SID>ClangCompleteInit()
au FileType c.*,cpp.*,objc.*,objcpp.* call <SID>ClangCompleteInit()
Which probably initalize ClangComplete.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

using lxml with beautiful soup - html-parsing

From the lxml website: If this fails attempt to build it yourself http://lxml.de/build.html#building-lxml-on-macos-x This may not work so don't rely on it Otherwise there are other parsers such as lxml.html (should work with lxml) and a few others that I'm not sure of.

Related

i´m trying to compile a pdf in mardown, but generate the next error

Import Pcaml grammar to extend OCaml's printer using camlp5

Which compression types support chunking in dask?

Error while generating make latexpdf with sphinx

Clang_complete not worrking

Categories

Resources