Text and Image data taining - machine-learning

I am new to machine learning. I am trying to do a project where I can recognize fruit and vegetable from picture and will be able to tell their health benefit. I am following this project as reference :
https://github.com/animesh1012/machineLearning/tree/main/Fruit_Vegetable_Recognition
here is the data set :
https://www.kaggle.com/datasets/kritikseth/fruit-and-vegetable-image-recognition
I have a text data set where I have the health benefit of each fruit and vegetable. When the model detect the fruit ,I want to load the text data which says its benefits. I have searched different methods .Somewhere I am not in correct track or missing the spot.How to do this.

Related

Recoding Area labels based on zip code data in excel

Basically, I'm using SPSS and have a variable named AREA that includes different major counties in California. Within this variable there is a value label for "Other" and I want to be able to relabel the data to go in their respective county. I have an excel sheet with all the zip codes that fall into those "Other" counties and do have a zip code data in the file as well. There are 400+ zip codes so I'm trying to see if there is an easy alternative to having to manually type in each zipcode into syntax to recode those values.
I've tried seeing if there was a way to reference the excel workbook, but have come up empty handed.
Any guidance or approaches to this problem would be appreciated!
The data in excel has unique zipcodes with corresponding Area value in cell to the right. In the data set there may be multiple instances of each zipcode.

Creating Dynamic Sheet Cell Reference List for pulling numbers to SUM

I've been working on building a data analysis sheet, which is quite verbose at the moment and a bit more complicated than it should be as I've been trying to figure this out. Please note, I work doing student data in a school.
Basically, I have two sets of input data:
Data imported from a CSV file that includes test data and codes for Common Core Standards and the questions tied to those standards as a whole class summary
Data imported from a CSV file that includes individual scores by question
I am looking to construct 2 views:
A view that collates and displays data of individual standards per student that includes a dropdown to change the standard allowing a teacher to see class performance by standard in a broad view. The drop-down is populated dynamically from the input data (so staff could eventually dump data and go directly to reports)
A view that collates and displays data of individual students broken down by performance on each standard allowing a teachers to see the broader spectrum for each student. The student drop-down is populated from Source list 2.
I have been able to build the first view, but am struggling with the second. I've been able to separate the question codes and develop strings of cell references to the scoring data, including a dynamic reference to the row the selected student's score data appears on in the second source set from above.
I tried to pass through an indirect() formula into a sum() so as to process for a mean evaluation, and have encountered errors. I think SUM() doesn't process comma-separated cell reference lists from Indirect() [or in general] or there is something that I am missing to help parse it. Here is the formula I have tried:
=Sum(vlookup(D7,CCCodeManip!$A:$C,3,false))
CCCodeManip!C:C includes the created text (based on the dynamic standards and question codes, etc), here's an example of what would be found there:
'M-ADI'!M17, 'M-ADI'!N17, 'M-ADI'!O17, 'M-ADI'!P17, 'M-ADI'!Q17, 'M-ADI'!R17, 'M-ADI'!J17
I need these to be dynamic so that teachers can input different sets of standards, question, and student data and the sheet automatically collates and reports it in uniform ways (with an upward bound of 20 standards as I currently have it built)
Here is a link to the sheet I built, with names and ID anonymized. There's a CRAP TON of sub-tabs, and that's really just being able to split apart and re-combine data neatly without things error-ing out due to data overlapping, aside from a few different attempts and different approaches to parse the cell reference strings.
The first two tabs are the current status of the data views. I plan to hide a bunch of the functional stuff that is there to help pull data accurately.
The 3rd and 4th tab are the source data sets. 5th is a modified version of source data that allows me to reference things better, and I've tried to arrange the sheets most relevant towards the front of the set.
https://docs.google.com/spreadsheets/d/1fR_2n60lenxkvjZSzp2VDGyTUO6l-3wzwaV4P-IQ_5Y/edit?usp=sharing
Some have a different approach? I am aware that I might be as far as I cn go with this and perhaps should consider scripts - my coding experience is a bit out of date and my strength is more with the formulas, but I can dig into things with some direction, if anyone can help.
Ok so I noticed something.
It seems the failure is in the indirect reference:
=indirect(CCCodeManip!C3)
The string I am trying to parse via indirect is going to be generated into something like this, dynamic from reference to other data:
'M-ADI'!M17, 'M-ADI'!N17, 'M-ADI'!O17, 'M-ADI'!P17, 'M-ADI'!Q17, 'M-ADI'!R17, 'M-ADI'!J17
The indirect returns the error that the above string is not a cell reference with the #REF code.
Can someone give me a clue as to what is causing this? I am going to dig into the docs on Indirect() from google and will post anything that I find.
Perhaps it is that indirect() can't handle lists, but only specific references and arrays, which may require me a to build a sheet to do the SUM formula on for each question set (?)
So I think I figured it out, but i Ended up parsing the data differently, basically doing the sum based on individual cell references and a separate sum formula, bypassing the need to do it all at once, it jsut makes my sheets a lot dirtier! I am eventually going to see if code could do it better if I need to, but this is closed for now.
Basically, I did individual cell references to recall scores in a row, then used a separate SUM formula, and created references / structures to be able to pull those sum() results. Achieves the same end, but with extra crap on the sheet.

Export row level variable labels, value labels and value

Somewhat new to SPSS and wondering if its possible to export into an excel file a data map containing the variable label, value label and corresponding value at a row level. I know you can download a data map via Display Data File Information but the variable label is a header rather than displayed on each row.
Example...
“what is your gender”,”male”,”1”
“what is your gender”,”female”,”2”
"primary car brand","Chevy","1"
"primary car brand","Buick","2"
"primary car brand","Fiat","3"
"primary car brand","Toyota","4"
"primary car brand","Kia","5"
Any assistance is appreciated.
See whether the CODEBOOK (Analyze > Reports > Codebook) procedure is more to your liking. Any Viewer table can be exported to Excel either via File > Export or using OMS. Or the exact format of the display you showed can be produced as a table using Python programmability. Details if you need to go that way.
In many use cases, the APPLY DICTIONARY (Data > Copy Data Properties) does what's needed without the need to create an external listing.

Trying to paste a lowest price from G2A onto google sheets

So I am trying to learn how to get the lowest selling price data from g2a games and put it onto google sheets. I tried the code below for Next car Game on G2A but get an error saying imported content is empty.
=IMPORTXML("https://www.g2a.com/next-car-game-early-acces-steam-cd-key-global.html","//div[#class=mp-Price]")
I am very new to this but hope to learn off of this
It is loaded by a separate json file through jquery. I found the source url but you will still have to parse out that data and then compare lowest rates potentially
=TRANSPOSE(IMPORTDATA("https://www.g2a.com/marketplace/product/auctions/?id=5759&v=0"))

Machine learning predict text fields based on text fields

I am working on machine learning and prediction for about a month. I have tried IBM watson with bluemix, Amazon machine learning, and predictionIO. What I want to do is to predict a text field based on other fields. My CSV file have four text fields named Question,Summary,Description,Answer and about 4500 lines/Recrods. No numerical fields are in the uploaded dataset. A typical record looks like below.
{'Question':'sys down','Summary':'does not boot after OS update','Description':'Desktop does not boot','Answer':'Switch to safemode and rollback last update'}
On IBM watson I found a question in their forums and a reply that custom corpus upload is not possible right now. Then I moved to Amazon machine learning. I followed their documentation and was able to implement prediction in a custom app using API. I tested on movielens data and everything was numerical. I successfully uploaded data and got movie recommendations with their python-boto library. When I tried uploading my CSV file The problem I had was that no text field can be selected as target. Then I added numerical values corresponds to each value in CSV.This approcah made prediction successful but the accuracy was not right. May be the CSV had to be formatted in a better way.
A record from the movielens data is pasted below. It says that userID 196 gave movieID 242 a two star rating at time (Unix timestamp) 881250949.
196 242 3 881250949
Currently I am trying predictionIO. A test on movielens database was run successfully without issues as told in the documentation using recommendation template. But still its unclear the possibilities of predicting a text field based on other text fields.
Does prediction run on numerical Fields only or a text field can be predicted based on other text fields?
No, prediction does not only run on numerical fields. It could be anything including text. My guess is that the MovieLens data uses ID instead of actual user and movie names because
this saves storage space (this dataset is there for a long time and back then storage is definitely a concern), and
there is no need to know the actual user name (privacy concern)
For your case, you might want to look at the text classification template https://docs.prediction.io/demo/textclassification/ . You will need to model how you want each record to be classified.

Resources