I have a Covid csv file, which I downloaded, and I would like to see columns of data and pull out as much information as possible, but I'm stuck on the first step.
lines = spark.read.option("header","true").csv("/home/adminp/Downloads/owid-covid-data.csv")
sys.argv[1])
lines.show()
can anybody help me
Related
I am using rapidminer because I want to perform a sentiment analysis.
The things are, I have 7 queries that I need to analyze together (companies' names that I need to analyze to obtain insights about the customer).
So my idea was then to extract the data with twitter app developer and then put in rapidminer to analyze.
When I open this data in rapidminer it shows that
there are some problems with the dataset with the following:
Error: file syntax error
Message: Value quotes not closed at position 346.
Last characters read: ght.
Help enter code here
-How do I fix this?
Once enter my spreadsheet data (.csv file). It shows me the error"
Cause: Unparseable number: "FALSE"
I've searched here already for answers but none helped me to solve this error.
Is it possible to analyze this data altogether or do I
have to do it separately?
I'm not sure if this is feasible, I
suppose it would interfere in the overall analysis?
I'm quite new at rapidminer, so I appreciate you all's help.
Thanks in advance.
I decided to ignore the problem so I just selected the option to replace errors with missing values. And analyze all data together.
I am currently working on two projects.
One has an excel file with size of about 130mb and you can image how much records it would be containing.
Other is using google sheets with records 25k+ and these will increase over times.
So for such mega uploads how should I go about in rails.
I am not finding a detailed tutorial addressing this issue, if someone has it then please share it with me.
Kindly advise me a strategy/gems that I should prefer.
Thanks.
Have you concidered converting to CSV and then importing?
There's a tutorial and Gem for that: RailsCasts 396
First, export to CSV. Then split into smaller files, fow example with
split data.csv
(OS X/Linux)
I'd implement the import as a rake task. You could also just generate seed.rb with a bit of string manipulation.
Rails shouldn't have a problem with a 170MB file, but it's often annoying to wait for long-running tasks to finish, especially if you're still debugging.
Alternatively, you can probably import it much faster if you talk directly to mysql. But you'll loose the convenience of rails and should at least do a
Data.all.each do |datum|
datum.touch
datum.save!
end
to verify.
I am looking for a database that contains the 13F/13G filings in Quandl but can't find any.
Maybe I am not using the right keywords?
Any suggestion where to find a curated dataset? I don't want to end up scraping EDGAR again.
Cheers!
I've been downloading these free curated data sets of Form 13F available as CSV, XLS, and JSON. They're formatted to be immediately analyzable.
I have a import functionality in that I upload CSV/XLS file to import a data. I have a 30000 records (CSV) file.(size : 3.4 MB).
This file will take 40 to 50 MIN. to import a data.
As per each record I stored data into 3 tables.
I want to reduce that time to importing a data.
What should I do please help me
Thanks In Advance
I'm going to assume that you are using tables in a database. Importing 3.4 MB shouldn't take that long to import. In on one of my recent projects, I had to import and parse 100MB+ of dictionary files and it only took a minute (python based).
The time really depends on the code that you have written. Although there are some things to look for that will help reduce the import time.
The first is don't print any values in loops. It generally uses up a good amount of time in any language.
Also only open the database once, no need to close it when your in the same function or problem space.
Use the executemany functionality when it is available. When you are ready to commit all of the changes then commit them all at once.
It would also be nice to see how you structured your import function, then I might be able to provide more details.
Edit:
See Improve INSERT-per-second performance of SQLite?
I need to import data to my app, now i do it via xls spreadsheets, but when in my case it has about 80.000 rows it is slow, so maybe is it better to chose another format? For example, will xml data be more faster in importing?
XML is unlikely to be any faster - it still needs to be parsed as strings and converted.
80,000 rows is quite a lot. How long does it take you?
Edit:
You can make what's happening more visible by dropping puts statements into your code, with timestamps. It's crude, but you can then time between various parts of your code to see which part takes the longest.
Or better yet, have a go at using ruby-prof to profile your code and see where the code is spending the most amount of time.
Either way, getting a more detailed picture of the slow-points is a Good Idea.
You may find there's just one or two bottlenecks that can be easily fixed.