This is the machine learning python program that I created to model sunsets and tweet a daily sunset prediction based on the current weather metrics.

The following three functions are to collect location and contact info for first time users. They will be used in the next section of code.

We will first to see if we have an existing user file on record, which contains country, zip code, weather api credentials, emailed info, and twitter credentials. If it is the user’s first time running the program and thus the attempt to access said file fails, the first time user will be instructed to enter user data and given the option to email predictions, tweet predictions, or both. This data is in turn exported as csv file for future use.

Below is the User csv file that is created when the program is ran by first-time users. It stores the unique user data for future reference.

Next we will call the ‘Open Weather Map’ API, which is one of the most popular weather APIs. I am adding this as a function as we will need to call on this API more than once.

The goal is for our program to spit out a prediction exactly an hour before the sun sets. To accomplish this we need to collect some time data, first of which is a function that determines current time. We then get the sunset time for the day that is included in the API we currently accessed. Using this sunset time we subtract an hour to determine our desired runtime.

The following block of code allows us to wait until our designated run time before the rest of our program commences. I programmed it to check every minute if the current time equals run time.

Once we reach the run time for the day we need to recall the first API to get updated metrics. After calling the API we collect our desired metrics as variables to be used when constructing our DataFrame.

Unfortunately, I’ve found that many of the real-time weather APIs have at least a few metrics which are consistently inconsistent with the actual conditions in the area. To try and get more accurate metrics I decided to incorporate multiple realtime APIs as data sources. My method was to compare the output of several APIs with the actual conditions and only keep the specific APIs and their metrics that were accurate.

In this next block of code I perform a pull request on a second weather api called Weather Bit. This api has some interesting metrics such as Diffuse Horizontal Irradiance (DHI), which is the amount of radiation that hits the surface that is has been broken up by particles in the atmosphere and thus arrives at an indirect path. I’m not certain if these more obscure weather metrics will actually have an effect on the sunset, but the goal is to test this based on the model and select the metrics that affect the sunset rating.

It’s now time to consolidate the current weather metrics and store them as a single Pandas DataFrame.

Notice that we call the DataFrame we just created “update”. We need to keep our recently collected metrics separate from the previously collected metrics until after the prediction is made.

We use the previous metrics to create the model and then run the update metrics through the model to spit out a prediction.

First we need to get the previous metrics as I just mentioned. To do this we import the legacy csv file that contains all previous day’s metrics and their respective sunset ratings as a DataFrame called legacy. If it is our first time running the program we will have to create one.

The function below can be called when it’s time to send out tweets and/or collect the public’s sunset ratings.

This next function combines all of the code necessary to gather the actual sunset rating from twitter users and store it in our legacy table so our model can improve. We fill call on this function later.

In order to create our model we need to remove any rows from our legacy data with missing values, such as the days in which no ratings were made.

We need at least 8 rows to begin making predictions so if we don’t have at least 8 rows of complete data then we initiate the rating process without tweeting out a prediction.

If we have at least 8 days worth of sunset data, we can run our legacy data through our model. We organize our inputs and outputs below.

To allow us to make meaningful comparisons between the importance of certain metrics for a good sunset, we need to standardize the values. To get a bit more technical, it allows us to compare like F-values, which quantify the degree to which a change in an input affects the output.

This is important for metrics that ranges that are not remotely similar. For instance, it would mislead our model to compare the effect of atmospheric pressure in its non-standardized form, which has a values mostly in the thousands, with the effect of a single digit metric such as visibility.

These next two blocks of code are used internally solely to access the effectiveness of our model and determine what metrics are significant and which metrics have no meaningful correlation to the sunset value and should thus be removed.

There are additional methods that can be utilized to gauge the accuracy of the model such as ‘train test split’, which I will include as soon as a have enough data for this to be useful.

Now that we have fitted our model to our legacy data, we can prepare the current metrics of the day from our update DataFrame created earlier to make a sunset prediction.

Here we run our day’s metrics through our model and get our predicted sunset, which we will email and/or tweet.

We now send out our prediction via email if the user chose this option when setting up. If the user opted out of emails then the program will catch the error and skip this step.

We can finally tweet out our sunset prediction and then initiate the rating process via the rating function from earlier. After the real sunset rating is gathered from twitter users we rerun the program. This allows the program to rerun ad infinitum until we manually stop the program.

These are the tweets generated by the program when testing it out on my personal twitter. It’s likely that my followers see my recent tweets and think i’ve lost my mind lol, so hopefully twitter approves my new developer account soon.

P.S. You can get the complete code on my GitHub or follow the Senor Sunset twitter account to get sunset predictions for Austin.