Python Tweet Sentiment

Tweet Sentiment analysis using python

Project Logo

So what is the project

For our first project we have decided to build a tweet sentiment analyzer. The plan is for it to take in tweets, store them in a SQL database and then using a separate script we are going to use AI natural language processing to analyze them. We will see how positive, negative and neutral each tweet is. We will then graph and analyze the data. We are hoping to see that the trend of how negative and positive tweets relates with the price of lets say bitcoin. If you want to learn more about us why not check our our home page.

import tweepy as tw
import pandas as pd
import mysql.connector
from mysql.connector import Error
import openpyxl

# your Twitter API key and API secret
my_api_key = "your key"
my_api_secret = "your secret"
# authenticate
auth = tw.OAuthHandler(my_api_key, my_api_secret)
api = tw.API(auth, wait_on_rate_limit=True)
HashtagToSearch = "#Bitcoin"
search_query = HashtagToSearch + " -filter:retweets"

TotalReqForTweets = 5
# get tweets from the API
tweets = tw.Cursor(api.search_tweets,
                   q=search_query,
                   lang="en",
                   since="2022-01-01").items(TotalReqForTweets)
# store the API responses in a list
tweets_copy = []
for tweet in tweets:
    tweets_copy.append(tweet)

print("Total Tweets fetched:", len(tweets_copy))

tweets_df = pd.DataFrame()
# populate the dataframe
txt = []
TTime = []
Dates1 = []
locs = []
sources = []
UName = []
OrigTweets = []
worked = 0
for tweet in tweets_copy:
    hashtags = []
    try:
        for hashtag in tweet.entities["hashtags"]:
            hashtags.append(hashtag["text"])
        text = api.get_status(id=tweet.id, tweet_mode='extended').full_text

    except:
        pass

    tweets_df = tweets_df.append(pd.DataFrame({'user_name': tweet.user.name,
                                               'user_location': tweet.user.location,
                                               'user_description': tweet.user.description,
                                               'user_verified': tweet.user.verified,
                                               'date': str(tweet.created_at),
                                               'text': text,
                                               'hashtags': [hashtags if hashtags else None],
                                               'source': tweet.source}))

    if not (
            "join" in text.lower() or "find" in text.lower() or "unlock" in text.lower() or "http" in text.lower() or text in OrigTweets):
        OrigTweets.append(text)
        worked = worked + 1
        text = ''.join(
            filter(lambda x: x in '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPGQRSTUVWXYZ@-.?!# \n', text))
        txt.append(text)
        Dup = str(tweet.created_at).split(" ")
        Dates1.append(Dup[0])
        T12 = Dup[1].split("+")
        TTime.append(T12[0])
        sources.append(tweet.source)
        UName.append(tweet.user.name)
        locs.append(str(tweet.user.location))
        tweets_df = tweets_df.reset_index(drop=True)

    else:
        print("Spam Deteced - " + text)

# show the dataframe

Cltxt = []
for i in txt:
    ArrWords = i.split()
    tx1 = ""
    for j in ArrWords:
        # print(j)
        tx1 = tx1 + " " + j
    Cltxt.append(tx1)

for i in Cltxt:
    print(i)
    print("-----------------------------------------------------")

try:
    connection = mysql.connector.connect(host='YourHost',
                                         database='YourDB',
                                         user='YourUN',
                                         password='YourPass')
    if connection.is_connected():
        db_Info = connection.get_server_info()
        print("Connected to MySQL Server version ", db_Info)
        cursor = connection.cursor()
        cursor.execute("select database();")
        record = cursor.fetchone()
        print("You're connected to database: ", record)

        for i in range(0, len(Cltxt)):
            Hs = HashtagToSearch
            tx = Cltxt[i] 
            UN = UName[i]
            D = Dates1[i]
            lo = locs[i]
            TT = TTime[i]
            So = sources[i]

            cursor = connection.cursor()
            sql_insert_query = """INSERT INTO    Tweets(Hashtag,TweetText,UserName,TweetDate,TweetTime, TweetSource)
            VALUES (%s,%s,%s,%s,%s,%s)"""

            insert_tuple_1 = (Hs, tx, UN, D, TT, So)
            cursor.execute(sql_insert_query, insert_tuple_1)
            connection.commit()

        print("Table connected succsesfully")
    print(worked)
    print(TotalReqForTweets)

except Error as e:
    print("Error while connecting to MySQL", e)

Mining the data

The first part of being able to determine sentiment of tweets was getting the actual tweets to do this we used the Tweepy API. This allowed us to mine tweets that contained a certain hashtag. For the first part of the project our chosen hashtag is #bitcoin. We are hoping to be able to mine at a rate of 200 an hour. Unfortunately most of these tweets are bots.

We separated these into good bots, and bad bots. A good bot was a price updater and a bad bot was a buy this product, get rich quick or join my crypto wallet type page. To get rid of these we identified keywords that are most often seen in bot text, such as “join” or “Easy money” and other text excerpts. We found this method along with user agent filtering to be extremely effective. The user agent filtering is most often done from the SQL query to pull the data.

Analyzing the data

So this is the next part of our journey. We have not progressed here however we are going to attempt to generate and analyze the data in such a way that it acts as a leading indicator to the crypto price. We are still figuring out exactly how we are going to make the information available to you however we are working on it and we are sure we will figure something out soon! Any advice on the matter will be greatly appreciated. Drop us a comment below!