Having trouble dealing with NA values in YEAR column - machine-learning

I was trying to clean a housing dataset to build a model. I was stuck on a step where I had NA values in GarageYrBlt column. The house doesn't have a garage and thus the GarageYrBlt column has NA in it. How should I handle them?
Here's my dataset:
Id GarageType GarageYrBlt
1 1 Attchd 2003
2 2 Attchd 1976
3 3 Attchd 2001
4 4 Detchd 1998
5 5 Attchd 2000
6 6 No Garage NA
These are just sample rows. I have a big data set with lots of NA values.

Year can be a valuable feature for both regression and classification problem.In this case you can try label encoding the year column so that all NA values will be given one code.Since this column has a connection with Garage type as you mentioned it is better not to eliminate those rows.
Hope this is useful.Thank you

Related

Using postgres percentile function with negative numbers

I have a table of containing records with negative numbers:
ID
Location
Temperature
1
Paris
-1
2
London
-2
3
Berlin
-3
4
Moscow
-4
5
Rome
-5
6
Warsaw
-6
7
Madrid
-7
8
Amsterdam
-8
9
Milan
-9
10
Zurich
-10
(my actual records and values are more numerous and more complex, but this should help illustrate the issue)
I want to get the minimum, first quartile, median, third quartile, maximum of the temperature values, but in reverse.
For instance, in my example I would have:
Aggregate
Value
Minimum
-1
First quartile
-2.5
Median
-5
Third quartile
-7.5
Maximum
-10
The problem as I see it is that my numbers are negative. So when I run:
SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY "city_temperatures"."temperature") AS percentile_temperature FROM "city_temperatures"
I actually get the value third quartile as opposed to the first quartile.
What's the best way to handle negative numbers in a query like this?
Add DESC to ORDER BY?
SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY t.temperature DESC) AS pct_temp
FROM city_temperatures t;
You might get all of it as array in a single calls with:
SELECT percentile_cont('{0,0.25,0.5,0.75,1}'::float8[])
WITHIN GROUP (ORDER BY t.temperature DESC) AS pct_temps
FROM city_temperatures t;

Get previous score of user x

Consider the following data set:
A:User | B:Date | C:Score | D:DiffLastResult
1: John 2021-01-01 7
2: Jane 2021-01-01 7
3: James 2021-01-01 8
4: John 2022-01-01 4
5: Jane 2022-01-01 9
6: James 2022-01-01 10
7: John 2022-06-01 10
8: Jane 2022-06-01 5
9: James 2022-06-01 7
Now, I want in column D to have the abs difference between the current score and the previous score (for the given user). So, for instance, user James' last score is 7. Previous score of James was 10, so the delta is minus 3, which should be displayed in cell D9. In cell D6, I want to have a value of 2 (10-8, previous score of James, in context of the score of 2022-01-01).
This list is simplified, for the purpose of asking this question. In my real file, the list of names is unorderded, non-repetitive (not all users have the same amount of scores)
I am using Google Sheets. I have tried using vlookup, lookup, and index/match combinations, but I keep getting the first score of James (instead of the previous one). The list is sorted on date ASC.
Can somebody point me in the right direction? Many thanks.
try:
=ARRAYFORMULA(IFNA(VLOOKUP(
A1:A&COUNTIFS(A1:A, A1:A, ROW(A1:A), "<="&ROW(A1:A))-1, {
A1:A&COUNTIFS(A1:A, A1:A, ROW(A1:A), "<="&ROW(A1:A)), C1:C}, 2, )))

How to iterate through a function in google sheets?

On one sheet I have a table of statistics similar to this:
A B C D
1 Teams MP GF GA
2 Team A 3 3 2
3 Team B 2 1 3
4 Team C 3 5 2
5 Team D 2 2 1
I then have some formulas that calculate an expected score between two teams set up like this:
A B C D E
7 Teams GF/G GA/G Avg Exp Score
8 Team 1 =VLOOKUP(A8,$A$1:$D$5,3)/VLOOKUP(A8,$A$1:$D$5,2) =VLOOKUP(...) =AVERAGE(...) =B8-C9+D8
9 Team 2 =VLOOKUP(...) =VLOOKUP(...) =AVERAGE(...) =B9-C8+D9
I then have a separate sheet that has the matchups between teams like this:
A B C
1 Date Matchup Exp Score
2 11/15 Team D =FORMULA(
3 11/15 Team B =FORMULA(
4 11/16 Team C =FORMULA(
5 11/16 Team A =FORMULA(
6 11/17 Team B =FORMULA(
7 11/17 Team C =FORMULA(
8 11/17 Team D =FORMULA(
9 11/17 Team A =FORMULA(
My question is if there is some kind of formula that can take the teams in the matchup, copy and paste them behind the scenes into cells A8 and A9, and spit out the Exp Score that would generate in E8 and E9. Is this something that is possible to do in Google Sheets or does it have to be manually copied and pasted into the cells and then copy and paste the results to where I want them?
I've put your formulas together, and come up with the following result, but I think possibly your logic for the average is a little bit off.
Should it not be:
=SUM(C2:C5) / SUM(B2:B5) /2
So the sum of all the goals scored divided by the total number of matches (number of times any team played divided by 2)? This gives the averages goals per game, and then your other formulas add a positive delta to the team with the stronger GF/G, and a negative delta to the team with the weaker GF/G.
Also, your data may not be valid. Shouldn't the total number of goals scored BY all teams, also equal the total number of goals scored AGAINST all teams? So the sum of column C must equal the sum of column D? I therefore changed the numbers in column D slightly.
The result then for your data looks like this:
where the formula in E1 is:
=ArrayFormula({"GF/G";C2:C5/$B2:$B5})
and in G1 is:
=ArrayFormula({"Avg Goals/G";SUM($C$2:$C$5) / (SUM(B2:B5)/2) })
Adding in your matchups and projected scores, I get this:
where the projected scores for all the teams in column I are given by this formula, in K2:
=ArrayFormula(vlookup(I2:I7,$A$2:$G$5,5)
- vlookup(J2:J7,$A$2:$G$5,5)
+ $G$2/2)
Note that I've duplicated columns K:M in columns N:P, but shown with a decimal place to show the average goals per game still equals 2.2, but with rounding adjustments (no fractions of a goal) it doesn't always work out right.
Here is my sample sheet.

How to count only specific word in googlesheets

For example i have a data like this
Ambassador Classic Nova Diesel
Audi A3 35 TDI Attraction
Audi A3 35 TDI Premium
Ford Figo Diesel EXI
Ford Figo Diesel EXI
Honda Accord 2.4 A/T
Honda Accord 2.4 A/T
Honda WRV i-VTEC VX
Honda WRV i-VTEC VX
Hyundai Accent CRDi
Hyundai Accent CRDi
Mini Cooper Countryman D High
Mini Cooper S
Mini Cooper S Carbon Edition
and i only want to count only the brand of the car, how do i do it?
You can do this with two simple steps in Microsoft Excel:
Select Data -> Text to Columns and then Delimited -> Tab. This will assign the brand and the model details to separate cells.
Mark the data and select Insert -> Pivot Table. Select the brand column and set the value to Count. This should solve your problem.

Compute the Sum of Columns in SPSS

I have a lot of columns in SPSS and for a calculation, I need to get the sum of each and every one of them. Is there a way to do this in SPSS?
An example of what I mean is shown below:
age gender question 1 question 2
-------------------------------------------------
25 m 2 3
19 f 4 2
20 f 3 4
------- -------
need sum need sum
If you just need an ouput table with the results then see the DESCRIPTIVES command.
Alternatively, if you need the results in an output dataset for further processing then see the AGGREGATE command.
use: Analyse > Reports > Summaries in Columns and add your columns

Resources