I am in the need for large SMS or Twitter datasets. Anybody knows where I might be able to find such things.
Thanks.
you can use this
http://www.infochimps.com/tags/twitter
Answering my own question:
There is an SMS spam collection to those who are in wanting a similar datasets as I am. Here is the link http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/
Try using below links to get twitter datasets
https://data.world/datasets/twitter
https://archive.org/search.php?query=collection%3Atwitterstream&sort=-publicdate
Also you can use Twitter API to collect tweets.
Related
There is a Twitter user who posts valuable stuff and I want to save his tweets not as text but as images (screenshots) just as you see it on your phone or computer.
I installed python-twitter and tweepy but I didn't find a solution in the docs and neither in communities so far.
Alternatively: Is there another way to save tweets in a kind of pretty, visually appealing way?
Thank you in advance.
With those libraries you can only extract Twitter data. Those aren't image processing libraries. You will have to write your own logic how you want those pictures displayed.
Look into Pillow.
I usually use this site called tweetcyborg.com It converts any tweet into an image.
I figured it out. Using Selenium Webdriver to open the Twitter account page in the browser, then scrape the tweets and use a screenshot tool to make an image. This looping through all tweets.
Does anyone know of a solution for checking if a tweet has replies or not, without checking the reply_count field of the JSON response?
I'm building a crawler and already have a method for scraping a timeline for tweets as well as replies to tweets. In order to increase efficiency I want to find out if a tweet has any replies at all before calling my reply method. I have a standard developer account with Twitter so I do not have access to reply_count.
Looking for this as well. Only way I found was scraping the page (which is against the Terms of Service)
reply-count-aria-${tweet.id_str}.*?(\d+) replies
Ive searched through stack, but answers are dated. I was wondering if anyone knows what it is to crawl a topic like security. How do I do this by using Twitter? Do I just follow people who tweet about this topic, re-tweet and tweet new things, or is there an exact way of doing this? I then need to make statistical analysis on the data I gather.
You can use Puppeteer to crawl twitter data.
Checkout their github repository here.
This is a repository that crawls twitter data using Puppeteer .
How about using twitter search api (https://dev.twitter.com/docs/api/1.1/get/search/tweets)
You need to create an app first(or better say register an app) on dev.twitter.com and use search api to query for tweets that contain security (assuming I understood your crawling in the right way). Once you have your tweets you can do statistical analysis on the gathered data.
I use twitteR package on R to crawl twitter data (https://github.com/geoffjentry/twitteR) . It includes simple and useful functions to get twitter data.
I want to use the new twitter embedded timelines (https://dev.twitter.com/docs/embedded-timelines) but how can I get the data for the right user by username? I have a lot of websites with users (>500) who have their username in the database but as far as I can see you have to give a data-widget-id to get the tweets from the right user.
Is there any way to do this by username? And if not, how can I quickly convert all my users' data-widget-id to the database?
Any help is appreciated
Looks like the best you can do is get a single valid data-widget-id and use it for many different user names using the data-screen-name property.
There is a discussion here from one of the twitter devs. The same one as #alex_b posted in the comments.
You can wrote same script for PHP, for Rails community this gits could be helpfull to get widget-id dynamically.
https://gist.github.com/shah743/dd042df63a8f307f16ed
I studied the Twitter API Documentation today. Only find that we could use "Twitter REST API Method: statuses user_timeline" to acquire statuses of a certain user. Retweets are stripped out of the user_timeline for backwards compatibility reasons. If I want retweets included, API Documentation recommend "statuses retweeted_by_me", but retweeted_by_me cannot return the retweets by other users.
I think maybe we can analyse the twitter webpage of a certain user to get his retweets. However is there any elegant way to crawl retweets of a certain user?
Thanks in advance!
This was addressed recently by the Twitter devs. You can now add a include_rts=true to your call to user_timeline. See the full discussion here: http://groups.google.com/group/twitter-development-talk/browse_thread/thread/7a4be385ff549ed0
You want to use the retweeted_to_me API call and then create a union with user_timeline and sort by datetime. It's a little annoying that they don't mix the stream for you.
Call statuses/user_timeline for the specific user then for each status you will have to call either statuses/id/retweeted_by or statuses/retweets.
http://apiwiki.twitter.com/Twitter-REST-API-Method:-GET-statuses-id-retweeted_by
http://apiwiki.twitter.com/Twitter-REST-API-Method:-statuses-retweets
You have to manually use GET statuses/retweets/:id for every Tweet from the use_timeline.