Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I want to solve the predicting house pricing problem (https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)
How could I transform string data into numerical data in Octave?
The link is paywalled, but its title mentions the word 'categorical', so I'm assuming that by 'numerical' you mean integer labels, rather than parsing a string that represents a number to its equivalent float.
So with that in mind, here's a typical way to represent this.
Indices = [ 1,2,3,2,3,2,1,2,1,2,3,1,3,3,1 ];
Labels = { 'class1', 'class2', 'class3' };
It really is as simple as that. If you really want this to be a single 'variable', you can collect it into a struct:
MyCategoricalVariable = struct( 'indices', Indices, 'labels', Labels );
Obviously it depends how the data is provided to you in the first place. If you're given the strings instead of the labels, you can convert it to an indices/labels pair like so:
Data = { 'a', 'b', 'c', 'c', 'b', 'c', 'b', 'a', 'a', 'a', 'b' };
Labels = unique( Data );
[~, Indices] = ismember( Data, Labels )
There are two possibilities of string data
Just one or two words, and value of test won't change (i.e. for a column it would take fix value of text), then you can use Labels. that would solve your problem.
if you have column which can take any value and length is also not fixed, you can first do TF-IDF and based on it you can train your model.
Related
This question already has answers here:
ultimate short custom number formatting - K, M, B, T, etc., Q, D, Googol
(3 answers)
Closed 7 months ago.
I currently have a sheet with numbers displayed in the "K,M,B" format (e.g: 1.2K, 5M, 1.3B).
And I am currently trying to make a function that converts this to the numerical values on a separate sheet.
For example, 1.2k would be displayed as 1200 and so on.
Currently I have:
=SUBSTITUTE(Shorts!B2,"K","")*1000
However, I would also like this SUBSTITUTE functions to handle the case for M (million) and B (billion) so I can drag the cell down the column.
But when I add more multiplications to the nested functions
=SUBSTITUTE(SUBSTITUTE(Shorts!B2,"K","")*1000,("M","")*1000000
it doesn't seem to work and I get a formula parse error.
Any guidance would be much appreciated.
try:
=INDEX(IF(REGEXMATCH(A1:A4&"", "M"),
REGEXEXTRACT(A1:A4, "\d+.\d+|\d+")*1000000,
IF(REGEXMATCH(A1:A4&"", "k"),
REGEXEXTRACT(A1:A4, "\d+.\d+|\d+")*1000,
IF(REGEXMATCH(A1:A4&"", "B"),
REGEXEXTRACT(A1:A4, "\d+.\d+|\d+")*1000000000, A1:A4))))
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 months ago.
Improve this question
I want use CONCAT(A1:A), but i need that the interval have only filled cells from column A.
https://docs.google.com/spreadsheets/d/1d6ac8ODvOlu__wmPmyz_Lcoikq40UbZVZE-d7QXcZI0/edit?usp=sharing
To concatenate all non-blank values in column A1:A, joining them with a line break, use textjoin(), like this:
=textjoin( char(10); true; A1:A )
See your sample spreadsheet.
To concatenate all values in column Y1:Y, joining them with a comma and a space, ignoring error values, and replacing line breaks with another character, use iferror(), textjoin() and substitute(), like this:
=substitute( textjoin( ", "; true; iferror(Y1:Y) ); char(10); "" )
To get the last value in a column, use sort(), like this:
=+sort(A1:A, not(isblank(A1:A)), false, row(A:A), false)
To get the row number of the last row with content, use filter():
=+sort( filter(row(A1:A), len(A1:A)), 1, false )
In the event you really need the the cell address, use address(), like this:
=+sort( filter( address(row(A1:A), column(A1:A), 4), not(isblank(A1:A)) ), filter( row(A1:A), not(isblank(A1:A)) ), false )
This last formula will get the cell address in A1 notation, which is not the same as a reference to a cell. To turn it into a reference, use indirect().
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I'm building a google sheet to keep track of my money.
I'm pretty new to all this and I'm learning how to do a few things.
I'd like to build a table where I can have categories (in green) and their subcategories. So I'd like to be able to have the sum of the sub categories in front of their parent category.
Also, this is a template that is probably going to change over time, so I'd like it to be modular (it would be simple otherwise) if I add a new subcategory or even a new category.
Thanks for the help.
Here is what I have managed to find and edit, maybe it's a great path... =IF($A3="✹";SUM(E3:INDEX(E3:E;MATCH(TRUE;(A3:A="✫");3)));"")
But actually it sums every row that has a star in the "A" column. It doesn't stop until it meet a sun
here is the link to access the sheet
Also, I know that Aspire Budget works that way, but it's a bit too complicated and I need help to understand it...
try:
=SUM(IFERROR(INDIRECT(ADDRESS(IF(INDIRECT(ADDRESS(ROW()+1, 1))="✫", ROW()+1, "×"), 5)&":"&
ADDRESS(MATCH("✹", INDIRECT("A"&ROW()+1&":A"), 0)+ROW()-1, 5))))
Here is a function that works:
=
if(
indirect("A"&(row()+1))="✫",
sum(
indirect(
"E"&(row()+1)&":E"&vlookup(
"✹",
{
indirect("A"&(row()+1)&":A"),
arrayformula(row(indirect("A"&(row()+1)&":A")))
},
2,
FALSE
)-1
)
),
0
)
The basic idea is to search for the next ✹. Then get the range between the two rows ad add them.
Formula rundown
Step 1: Get the values starting after this row
This can be done using indirect. Basically it's A<next row>:A.
The number of row it's simply row(); and the next one row()+1.
So using indirect we get:
=
indirect("A"&(row()+1)&":A")
Step 2: Find the next category
We want to have the number of the next category. This can be achieved using vlookup.
What we do is make an array with the value on the first column and the number of column on the second one. To get the number of row of a value row can be used. So it would like:
{
<range>,
arrayformula(row(<range>))
}
We need arrayformula to make sure it's used as a formula.
<range> would be the value on the last step so together it looks like:
=
{
indirect("A"&(row()+1)&":A"),
arrayformula(row(indirect("A"&(row()+1)&":A")))
}
Now we need to add the vlookup:
=
vlookup(
"✹",
{
indirect("A"&(row()+1)&":A"),
arrayformula(row(indirect("A"&(row()+1)&":A")))
},
2,
FALSE
)
Which is basically getting the value of the first case of ✹. So now we have the number of the next row of the category.
Step 3: Get the range in the middle
Now that we have the value we can get the range of the values to sum. This would be E<next row>:E<row before next category> or with a function:
indirect(
"E"&<next row>&":E"&<next category row>-1
)
so next row is row()+1 and the next category row is the result of the last step. Together:
=
indirect(
"E"&(row()+1)&":E"&vlookup(
"✹",
{
indirect("A"&(row()+1)&":A"),
arrayformula(row(indirect("A"&(row()+1)&":A")))
},
2,
FALSE
)-1
)
Step 4: sum
Now we can add sum to all of it. Nothing complicated here.
Step 5: Add conditional for empty categories
Some categories are empty so we need to check that the next row is an entry. This can be done with the simple if:
if(
indirect("A"&(row()+1))="✫",
<sum formula>,
0
)
So if you put everything together you get the result.
Final thoughts
Even though it seems massive the formula is not too complex. Try looking into it and let me know if there is something that's not clear enough.
Refrences
VLOOKUP (Docs Editor Help)
SUM (Docs Editor Help)
ROW (Docs Editor Help)
INDIRECT (Docs Editor Help)
IF (Docs Editor Help)
ARRAYFORMULA (Docs Editor Help)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
So I have this data with a lot of different PINs. Each PIN corresponds with a product, so PIN 1205463MB means a product. I want to write the PIN in a cell and automatic write the product name in the next column.
I tried the VLOOKUP function and didn't work , any suggestions?
I already tried this formula I found in a Stack Overflow post, but didn't work either.
=INDEX($B$4:$K$9;MATCH($A$17;$A$5:$A$9;0);COLUMN(A4))
Also in a near future another PIN will be added and then the answer to my problem need to be an easy one to adapt.
You can use the VLOOKUP as it is exactly what you are asking to do:
In this image you can see on the left I have the table with the pins and its product names.
Then on the right, I put another column where I will manually insert the PIN. The column next to it has a dynamic formula (which means that it will fill in the rows below if necessary) that will look for the value you put on the left.
=ArrayFormula( if( len(D2:D), VLOOKUP($D2:D, A2:B16, 2, false),))
^^^^^ ****** ^ *****
arguments: || || | ||
1. column with manually inserted pin----| || | ||
2. whole table you are looking for the value---| | ||
3. number of column you want to retrieve (prod name)-| ||
4. flag to say the table is not sorted---------------------|
This means, every PIN that you will have on the left will have the product name automatically setup on the column with the formula. The dynamic formula as you can see fills the whole column, as I put a random pin on the cell D6 and it filled the product name as well.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have data from multiple tabs in Google Sheets that I am combining into one master tab. I would then like the data on the master tab to be alphabetized based on the last name in one column automatically if more information is added. Here is the formula I used to combine the multiple tabs onto the master tab:
={filter(tab1!A5:Z, tab1!B5:B<>"");FILTER(tab2!A5:Z, tab2!B5:B<>""); FILTER(tab3!A5:Z, tab3!B5:B<>""); FILTER(tab4!A5:Z, tab4!B5:B<>"");FILTER(tab5!A5:Z, tab5!B5:B<>""); FILTER(tab6!A5:Z, tab6!B5:B<>"")}
I have not been able to find anything that would work when I google "how to sort data in abc order from a filter function google sheets".
You can use SORT() and there's even an example in its documentation that applies to your situation:
SORT({1, 2; 3, 4; 5, 6}, 2, FALSE)
So if the last name is in column B, which is the second column you could write (formatted for clarity)
=SORT(
{
FILTER(tab1!A5:Z, tab1!B5:B<>"");
FILTER(tab2!A5:Z, tab2!B5:B<>"");
FILTER(tab3!A5:Z, tab3!B5:B<>"");
FILTER(tab4!A5:Z, tab4!B5:B<>"");
FILTER(tab5!A5:Z, tab5!B5:B<>"");
FILTER(tab6!A5:Z, tab6!B5:B<>"")
},
2,
TRUE
)