Logistic regression with two text features - machine-learning

I can only find ways to implement logistic regression with just one text feature (i.e. spam detection), where TFIDF would be used. But as Feature_A and Feature_B are the features to predict the label I'd like to know which method to use when working with two text features as in my (limited) knowledge using TFIDF only works with one text feature. I'll be using Python and SciKit-learn. Help would be greatly appreciated!
Example of the kind of dataset I'll be working with:
Feature_A
Feature_B
Label
Lorem ipsum dolor sit amet
consectetur adipiscing elit
0
Proin venenatis
est sit amet rhoncus efficitur
1
non bibendum massa nulla nec nulla
Quisque sit amet suscipit ligula
1
Quisque aliquet lacus non nibh elementum faucibus
posuere, justo eget malesuada porta
0
justo sem vestibulum felis
ac facilisis ante nulla a justo
1

If the two texts are related in a way you want reflected in the tfidf, consider concatenating them.
Otherwise, you can just encode them separately and hstack the results; that can be done with a ColumnTransformer, see ColumnTransformer fails with CountVectorizer/HashingVectorizer in a pipeline (multiple textfeatures).

Related

Create a table in Tableau Public with word wrap on a dimension and a header on a measure shown as text

I want to include a table on a dashboard. I have a product name, a description of that product, and a value:
Product
Description
Value
A
Lorem ipsum dolor sit amet consectetur adipiscing elit.
300
B
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
234
C
Ornare quam viverra orci sagittis eu volutpat
496
D
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
220
E
Lorem ipsum dolor sit amet consectetur adipiscing elit.
105
F
Aliquam ultrices sagittis orci a scelerisque purus semper eget. Id cursus metus aliquam eleifend mi. 
602
G
Et tortor at risus viverra adipiscing at in tellus integer. Quisque egestas diam in arcu cursus euismod quis. 
549
H
Ornare quam viverra orci sagittis eu volutpat
419
I expect the table to look something like this:
This is the closest I've come:
There are at least two issues:
1. Word wrap
I can't get word wrap to work on the description. I have changed every format setting I could find to word wrap on:
2. A header for the continuous field
The Value header is missing on the continuous field because Tableau seems to want an axis to label.
You are very close to it.
actually, you have to enable the word wrap and you will then drag the row a little bit below, try doing that and you will see that the text automatically adjusts in your expected format.
For the header, drag measure names to the column pane and measure values to the mark pane text box and then filter the measure names and only select sales (in your case , "Value")

Put a mathematical formula in the middle of a new sentence in LaTeX

I wrote down the following mathematical formula, and before it I defined it in some sentences. I would like to put this mathematical formula in the middle of a sentence. How is it possible to do this in LaTeX?
${( \,{Y}) \,}\equiv H[ \,{{p}( \,{y}) \,}] \,=\sum_{y} {p( \,{y})\: \\log \,{p( \,{y}) \,}}$
Actually, as you can see in the above equation is located at the left side of the sentence, I would like to shift it at the center of the sentence. I tried to use \hspace{3cm} to move it to right, but I would like to know a general way, because I should write down many formulas in one page and I would like to see all of them at the center of each sentence and aligned also I would like to specify each equation by specific number. However, the \begin{equation} will give a number to each equation automatically
I read your comment to the question and I can confirm what I wrote into mine. Further, you have several choices for the delimiters if you want centered but not labeled math:
\documentclass{article}
\begin{document}
$$A=A$$
\[B=B\]
\begin{displaymath}
C=C
\end{displaymath}
\end{document}
This is the proper way, instead of approximating centering using \hspace command.
Also, if you want: a further reading.
Set your math equations inside \[...\]. This will centre them horizontally as a display and leave them unnumbered. If you want to add a number manually, you can use \tag{<stuff>} (which can be \labeled and referenced later using \eqref):
\documentclass{article}
\usepackage{amsmath}
\begin{document}
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam posuere maximus diam sed mattis.
Curabitur non ultrices orci, sit amet placerat nibh. Quisque in urna in erat convallis ultrices.
Aliquam arcu diam, scelerisque in tellus sit amet, sollicitudin lobortis lacus. Nulla ut sapien
in lacus consectetur maximus.
\[
(Y) \equiv H[ p(y) ] = \sum_y p(y) \log p(y)
\]
Maecenas vitae purus vitae dolor porttitor molestie. Nulla gravida,
odio vitae congue venenatis, mi eros feugiat justo, quis rutrum mi ipsum at odio. Pellentesque
pharetra arcu sapien, feugiat lacinia arcu finibus at. Maecenas scelerisque risus sed imperdiet
venenatis. Proin pretium varius tellus luctus egestas. Maecenas risus felis, laoreet vitae mauris
viverra, dignissim rutrum nulla.
\[
(Y) \equiv H[ p(y) ] = \sum_y p(y) \log p(y) \tag{x.y}\label{eqn:x-y}
\]
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam posuere maximus diam sed mattis.
Curabitur non ultrices orci, sit amet placerat nibh. Quisque in urna in erat convallis ultrices.
Aliquam arcu diam, scelerisque in tellus sit amet, sollicitudin lobortis lacus. Nulla ut sapien
in lacus consectetur maximus. \eqref{eqn:x-y}
\end{document}
I'm having difficulty visualizing how you want to center the mathematical formulas in the middle of each sentence because sentences have variable lengths resulting in a jumbled page layout that is harmful to readability. A cleaner way of doing it would by to center the mathematical formulas in the center of new lines. If it was my page I would probably do both on the same page. Sometimes I want to position an equation in the exact center of a new line in order to give it focus, otherwise I want to position the rest of the equations inline in order to improve readability. There's no reason why you can't do both, and a granular approach gives you the most control over the page's layout.
Insert the LaTeX equation anywhere in a sentence as HTML or markdown. A pair of <sub> </sub> HTML tags is used to format the y subscript that goes after the sigma character. Markdown also supports the <sub> tag.
(Y) ≡ H[p(y)] = Σyp(y)log p(y)
I suppose there are websites for converting LaTeX code to other formats, but the method I use is to convert from LaTeX to an image in a word processor by following the steps in this answer and then manually type the plain text that's in the image. Another way to do it is with the KLatexFormula open source app. I searched Wikipedia for the triple bar character and sigma character that are seen in the above mathematical formula.
KLatexFormula
KLatexFormula preview

correctly display indentation inside <pre> tag

What is the recommended method of displaying preformatted text (especially code samples with significant whitespace) with Slim?
For example, how to display the below (with correct indentation) using Slim:
<pre>
Vivamus eu lacinia nisi
Nam pretium urna magna
Donec sit amet enim ac augue luctus pharetra.
Pellentesque dictum
Enim vel
Cras risus lectus
</pre>
Embedded engines are an acceptable answer, but not preferred.
Well, all you need to do is use the pipe:
pre
| Vivamus eu lacinia nisi
Nam pretium urna magna
Donec sit amet enim ac augue luctus pharetra.
Pellentesque dictum
Enim vel
Cras risus lectus
Everything after the pipe is copied over. See the slim docs: http://rdoc.info/gems/slim/file/README.md#Text__

Rails gem to break a paragraph into series of sentences

I'm trying to split a paragraph into series of sentences such that each sentence group stays under N characters. In case of a single sentence that is longer than N, it should be split into chunks with punctuation marks or spaces as separators.
E.g., if N = 50, then the following string
"Lorem ipsum, consectetur elit. Donec ut ligula. Sed acumsan posuere tristique. Sed et tristique sem. Aenean sollicitudin, sapien sodales elementum blandit. Fusce urna libero blandit eu aliquet ac rutrum vel tortor."
would become
["Lorem ipsum, consectetur elit. Donec ut ligula.", "Sed acumsan posuere tristique.", "Sed et tristique sem.", "Aenean sollicitudin,", "sapien sodales elementum blandit.", "Fusce urna libero blandit eu aliquet ac rutrum vel", "tortor."]
Are there any rails gems that could help me to achieve this? I looked at html_slicer, but I'm not sure it can handle the example above.
There are two non-trivial tasks to achieve what you are after:
splitting a string into sentences
and word-wrapping each sentence with extra care for punctuation.
I think the first one is not easy to implement from scratch so your best bet might just be to use natural language processing libraries provided that your "third-party language processing service" doesn't have such a feature. I don't know any "rails gem" to meet your requirement.
Here is just a toy example of splitting a string into sentences using stanford-core-nlp.
require 'stanford-core-nlp'
text = "Lorem ipsum, consectetur elit. Donec ut ligula. Sed acumsan posuere tristique. Sed et tristique sem. Aenean sollicitudin, sapien sodales elementum blandit. Fusce urna libero blandit eu aliquet ac rutrum vel tortor."
pipeline = StanfordCoreNLP.load(:tokenize, :ssplit)
a = StanfordCoreNLP::Annotation.new(text)
pipeline.annotate(a)
sentenses = a.get(:sentences).to_a.map &:to_s # Map with to_s if you want an array of sentence string.
# => ["Lorem ipsum, consectetur elit.", "Donec ut ligula.", "Sed acumsan posuere tristique.", "Sed et tristique sem.", "Aenean sollicitudin, sapien sodales elementum blandit.", "Fusce urna libero blandit eu aliquet ac rutrum vel tortor."]
The second problem is similar to word-wrapping and if it exactly were a word-wrapping problem, it should be easily solved using existing implementations like ActionView::Helpers::TextHelper.word_wrap.
However, there is an extra requirement concerning punctuations. I don't know any existing implementation to achieve exactly the same goal of yours. Maybe you have to come up with your own solution.
My only idea is to firstly word-wrap each sentence, secondly split each line with a punctuation and then join the pieces again but with limitation on length. I wonder if this would work though.

Vertical line with every quotation

I often want to tell some comments inside of the text, which aren't so closely related to the discussed topic. Usually for this purpose I use the quotation environment, because of its large indent on the left. The comment can be large, it can include formulas, code listings, nested quotations, and so on.
How can I make quotation environment to draw a long vertical line on the left of all its content? You can often find this style on the Web with the actual quotes.
Google has found one solution:
\begin{flushleft}
\hbox{%
\vrule\hspace{.5em}\parbox{.9\textwidth}%
{Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi id hendrerit
nunc. Sed scelerisque lacus vitae erat eleifend eleifend. Donec eros mi, placerat
in porta eleifend, placerat a urna. Pellentesque venenatis neque non turpis
convallis vehicula. Aliquam aliquet ultricies tincidunt.}}
\end{flushleft}
But it cannot deal with code listings etc. inside of the text.
Thank you for your advice. Sorry if my English wasn't understandable enough.
Have you tried using a tabular environment?
Here is some code that creates a vertical line for the text you have given above,
\begin{tabular}{|p{10cm}}
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi id hendrerit
nunc. Sed scelerisque lacus vitae erat eleifend eleifend. Donec eros mi, placerat
in porta eleifend, placerat a urna. Pellentesque venenatis neque non turpis
convallis vehicula. Aliquam aliquet ultricies tincidunt.\\
\end{tabular}
You need the p{10cm} to define the width of the column for the text to wrap, otherwise it goes off the page. You can change the value depending on your margins and paper format.
Here is the result,
And as far as I know, tabular environment accepts the code listings package.
I would suggest using framedbox/leftbar:
\usepackage{framed}
\newenvironment{quotationb}%
{\begin{leftbar}\begin{quotation}}%
{\end{quotation}\end{leftbar}}
Then you can use \begin{quotationb}......\end{quotationb}

Resources