Print tweet full_text from tweet raw_data in process_data - twitter

I am streaming tweets from a certain user id and using a method process_data to print the streamed tweet.
how can i get the tweet full text from the raw_data below is my code
class MaxListener(tweepy.StreamListener):
def on_data(self, raw_data):
self.process_data(raw_data)
return True
def process_data(self, raw_data):
print(raw_data)
def on_error(self, status_code):
if status_code == 420:
return False
Below is how the raw data is printed
{"created_at":"Fri May 07 06:31:29 +0000 2021","id":1390554865030647808,"id_str":"1390554865030647808","text":"On May 7, 1895 \u2013 In Saint Petersburg, Russian scientist Alexander Popov demonstrates to the Russian Physical and Ch\u2026 https:\/\/t.co\/FhCbeYTQjX","source":"\u003ca href=\"https:\/\/mobile.twitter.com\" rel=\"nofollow\"\u003eTwitter Web App\u003c\/a\u003e","truncated":true,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":824740868011745281,"id_str":"824740868011745281","name":"Wincelee","screen_name":"Wincelee2","location":"Maasai Mara University Kenya","url":"https:\/\/sites.google.com\/view\/manu-website\/home","description":"I wanna die on Mars too just not on impact","translator_type":"none","protected":false,"verified":false,"followers_count":40,"friends_count":385,"listed_count":0,"favourites_count":21,"statuses_count":362,"created_at":"Thu Jan 26 22:08:40 +0000 2017","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang":null,"contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"FAB81E","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/1053591683785543680\/v15uC5Hy_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/1053591683785543680\/v15uC5Hy_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/824740868011745281\/1512946471","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null,"withheld_in_countries":[]},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"extended_tweet":{"full_text":"On May 7, 1895 \u2013 In Saint Petersburg, Russian scientist Alexander Popov demonstrates to the Russian Physical and Chemical Society his invention, the Popov lightning detector\u2014a primitive radio receiver.","display_text_range":[0,201],"entities":{"hashtags":[],"urls":[],"user_mentions":[],"symbols":[]}},"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/FhCbeYTQjX","expanded_url":"https:\/\/twitter.com\/i\/web\/status\/1390554865030647808","display_url":"twitter.com\/i\/web\/status\/1\u2026","indices":[117,140]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en","timestamp_ms":"1620369089740"}
Below is what i have tired in process_data method
def process_data(self, raw_data):
#print(raw_data)
tweet = json.loads(raw_data)
print(tweetInfo.full_text, "\n")
But i am getting the following error
AttributeError: 'str' object has no attribute 'full_text'

Not sure how your code is working, but if you are saying the raw_data is as posted, then it's just
print(raw_data['extended_tweet']['full_text'])
in your code, for tweetInfo in tweet: means tweet is a list (or otherwise iterable), or are you trying to iterate on tweet.keys()?

Related

Tweepy Streamer Limiting Tweets to 140 Characters

I created a tweepy listener to collect tweets into a local MongoDB during the first presidential debate but have realized that the tweets I have been collecting are limited to 140 characters and many are being cut off at the 140 character limit. In my stream I had definied tweet_mode='extended' which I thought would have resolved this issue, however, I am still not able to retrieve the full length of tweets longer than 140 characters. Below is my code:
auth.set_access_token(twitter_credentials.ACCESS_TOKEN, twitter_credentials.ACCESS_TOKEN_SECRET)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
# Create a listener MyListener that streams and stores tweets to a local MongoDB
class MyListener(StreamListener):
def __init__(self):
super().__init__()
self.list_of_tweets = deque([], maxlen=5)
def on_data(self, data):
try:
tweet_text = json.loads(data)
self.list_of_tweets.append(tweet_text)
self.print_list_of_tweets()
db['09292020'].insert_one(tweet_text)
except:
None
def on_error(self, status):
print(status)
def print_list_of_tweets(self):
display.clear_output(wait=True)
for index, tweet_text in enumerate(self.list_of_tweets):
m='{}. {}\n\n'.format(index, tweet_text)
print(m)
debate_stream = Stream(auth, MyListener(), tweet_mode='extended')
debate_stream = debate_stream.filter(track=['insert', 'debate', 'keywords', 'here'])
Any input into how I can obtain the full extended tweet via this listener would be greatly appreciated!
tweet_mode=extended has no effect on the legacy standard streaming API, as Tweets are delivered in both truncated (140) and extended (280) form by default.
So you'll want your Stream Listener set up like this:
debate_stream = Stream(auth, MyListener())
What you should be seeing is that the JSON object for longer Tweets has a text field of 140 characters, but contains an additional dictionary called extended_tweet which in turn contains a full_text field with the full Tweet text.
You can try changing your second to last line to
debate_stream = Stream(auth, MyListener()).extended_tweet["full_text"]
Not sure, if this will work, but try it out.

Check if String Contains an Emoji in Ruby

In ruby, here is how you can check for a substring in a string:
str = "hello world"
str.include?("lo")
=> true
When I am attempting to save an emoji in a text column in a rails application (the text column within a mysql database is utf8), it comes back with this error:
Incorrect string value: \xF0\x9F\x99\x82
For my situation in a rails application, it suffices to see if an emoji is present in the submitted text. If an emoji is present: raise a validation error. Example:
class MyModel < ApplicationRecord
validate :cannot_contain_emojis
private
def cannot_contain_emojis
if my_column.include?("/\xF0")
errors.add(:my_column, 'Cannot include emojis")
end
end
end
Note: The reason I am checking for \xF0 is because according to this site, it appears that all, or most, emoji's begin with this signature.
This however does not work. It continues to return false even when it is true. I'm pretty sure the issue is that my include statement doesn't work because the emoji is not converted to bytes for the comparison.
Question
How can I make a validation to check that an emoji is not passed in?
Example bytes for a smiley face in UTF8: \xF0\x9F\x99\x82
You can use the Emoji Unicode property to test for Emoji using a Regexp, something like this:
def cannot_contain_emojis
if /\p{Emoji}/ =~ my_column
errors.add(:my_column, 'Cannot include emojis')
end
end
Unicode® Technical Standard #51 "UNICODE EMOJI" contains a more sophisticated regex:
\p{RI} \p{RI}
| \p{Emoji}
( \p{EMod}
| \x{FE0F} \x{20E3}?
| [\x{E0020}-\x{E007E}]+ \x{E007F} )?
(\x{200D} \p{Emoji}
( \p{EMod}
| \x{FE0F} \x{20E3}?
| [\x{E0020}-\x{E007E}]+ \x{E007F} )?
)*
[Note: some of those properties are not implemented in Onigmo / Ruby.]
However, checking for Emojis probably not going to be enough. It is pretty clear that your text processing is somehow broken at some point. And if it is broken by an Emoji, then there is a chance it will also be broken by my name, or the name of Ruby's creator 松本 行弘, or by the completely normal English word “naïve”.
Instead of playing a game of whack-a-mole trying to detect every Emoji, mathematical symbol, Arabic letter, typographically correct punctuation mark, etc., it would be much better simply the fix the text processing.
I found Jörg's solution was only working when passing in the string itself and not a variable. Not sure why that is.
/\p{Emoji}/ =~ "🎃"
=> 0
value = "1f383"
=> "1f383"
/\p{Emoji}/ =~ value
=> 0
/\p{Emoji}/ =~ "hello"
=> nil
Regardless I'd recommend using the unicode-emoji gem, as its approach is comprehensive. Its source code and documentation can be found on GitHub.

How do remove the leading status id from a rails log

I wrote my own logger that writes a line as follows:
#my_log.info(message)
Where message would be something like:
{"action": "follow", "object": "Product"}.to_json
The log line leads with the word INFO, thereby making the line not pure JSON:
INFO -- : {"action":"follow","object":"Product"}
Is there a way to use the rails logging mechanism but leave off that leading log level so that the entire file will be json?
You can do it with formatter method, from the docs:
You may change the date and time format via datetime_format=.
logger.datetime_format = '%Y-%m-%d %H:%M:%S'
# e.g. "2004-01-03 00:54:26"
Or, you may change the overall format via the formatter= method.
logger.formatter = proc do |severity, datetime, progname, msg|
"#{datetime}: #{msg}\n"
end
# e.g. "2005-09-22 08:51:08 +0900: hello world"
That example should work as is for your case, since only msg is printed (well, and datetime, but i assume you'll want to keep that).

Parse web scraped data

So i have this coming from my web scrape
pastebin.com/CMrFcBMX
What i'm wanting is all the prices and ticket description. Heres what i have
doc.xpath("//script[#type='text/javascript']/text()").each do |text|
if text.content =~ /more_options_on_polling/
price1 = text.to_s.scan(/\"(formatted_(?:total_price))\":\"(.+?)\"/).uniq
description = text.to_s.scan(/\"(ticket_desc)\":\"(.+?)\"/).uniq
price = price1 + description
render json: price
end
end
So this is what i have at the moment. However i'm needing to do some major edits.
Firstly i'm needing the description to ignore any plus symbols, e.g. Later Owl + Chance For VIP Upgrade\ would need to be ignored.
Secondly I need to remove the json rendering nice, So that the first price and fees match with the first description.
Once i have this rendered i should be sorted. I'd be using this in a js file afterwards so a format like this would be best:
Ticket{
[
Price:
Fees:
Description:
]
}
Once i have it like this i should be good to finish my application ^_^
Thanks
Sam

DateTime from WebApi to Breeze is transformed with the Time Localization

I have a form on which I set a start Date and a finish Date for a entity.
On the Web Api side, before saving the date to the database,I set the start date: 2013-09-25 00:00:00.000 and the the end date as 2013-09-26 23:59:59.000.
var vote = (VotingSet)Entity;
vote.Start = new DateTime(vote.Start.Year, vote.Start.Month, vote.Start.Day, 0, 0, 0, 0);
vote.End = new DateTime(vote.End.Year, vote.End.Month, vote.End.Day, 23, 59, 58);
This is from the JSON that is send to the rest service looks like this:
Start: "2013-09-25T00:00:00.000Z"
End: "2013-09-26T00:00:00.000Z"
After the save, in the javascript client, the entity is updated with the new key and with the properties that come from the server.
The observable date objects will have the following value
Start: Wed Sep 25 2013 03:00:00 GMT+0300 (GTB Daylight Time)
End: Fri Sep 27 2013 02:59:58 GMT+0300 (GTB Daylight Time)
This is what i am getting back from the server
Start: "2013-09-25T00:00:00.000"
End: "2013-09-26T23:59:58.000"
How can i make sure that the hours in my object are not modified?
EDIT:
There is a a good explaniation here on what's happening with the datetime in javascript.
In the end i used this snipped to solve my problem:
breeze.DataType.parseDateFromServer = function (source) {
var date = moment(source);
return date.toDate();
};
It override's breeze own function with adds a time offset to the datetime.
Breeze does not manipulate the datetimes going to and from the server in any way EXCEPT to add a UTZ timezone specifier to any dates returned from the server that do not already have one. This is only done because different browsers interpret dates without a timezone specifier differently and we want consistency between browsers.
This is discussed in more detail in the answer posted here.
You are passing ISO8601 formatted timestamps, which is good. When you pass the Z at the end, you are indicating that the timestamp represents UTC. When you load those into JavaScript, it's going to take that into account.
You still need to show more code if you are looking for a useful response. What you've currently described from .NET doesn't quite line up with the timestamps you've provided. And it seems like most of this problem has to do with JavaScript and you haven't yet shown any of that code, so I can only guess what you might be doing. Please update your question, and understand that we have no knowledge of your system other than what you show us.
It's possible you may find moment.js to be useful in this scenario, but I can't elaborate further without seeing the relevent JavaScript code.

Resources