Building A Stock Prediction Software With Python

🚀 Your Algo Edge Just Leveled Up — Premium Plans Are Here!🚀

A year in, our Starter, Pro, and Elite Quant Plans are crushing it—members are live-trading bots and booking 1-on-1 wins. Now with annual + lifetime deals for max savings.

Every premium member gets: ✅ Full code from every article ✅ Private GitHub repos + templates ✅ 3–5 deep-dive paid articles/mo ✅ Early access + live strategy teardowns

Pick your edge:

Starter (€20/mo) → 1 paid article + public repos
Builder (€30/mo) → Full code + private repos (most popular)
Master (€50/mo) → Two 1-on-1 calls + custom bot built for you

Best deals: 📅 Annual: 2 months FREE 🔒 Lifetime: Own it forever + exclusive perks

Market Timing Beats Time in the Market (At Least Since 1871)

Why Strategic Timing Outperforms Buy-and-Hold — Evidence from 150+ Years of Data

algoedgeinsights.beehiiv.com/p/market-timing-beats-time-in-the-market-at-least-since-1871

👉 Upgrade Now →

First 50 annual/lifetime signups get a free 15-min audit. Don’t wait—the market won’t.

— AlgoEdge Insights Team

Today, I’m excited to share a new piece of content that’s a bit different from what I usually publish. While I typically don't post code tutorials, I believe today's guide will be valuable even for those who aren't software developers. Whether you're new to coding or just curious, I hope you’ll find it both informative and accessible!

Let’s begin!

This project combines a Python framework, with tools like Matplotlib, Sklearn, and Yahoo Finance to predict future stock prices. I used a Linear Regression Model from Sklearn because it’s well-suited for forecasting stock trends. To build an accurate model, I pulled a real-time dataset from Yahoo Finance, which provided essential features like “Close Price,” “Open Price,” “Volume,” “High,” and “Low.” I also added a “Date” feature to keep track of daily data. This dataset served as the foundation for training and testing my model. Here’s an example of the raw data I worked with from Google.

Below is the code I implemented to achieve this functionality. I incorporated Python’s datetime module to allow users greater flexibility in setting the time range for the stock data.

from datetime import date
import pandas as pd
import yfinance as yf

howmanyyears = int(input("How many years? > ")) # <-- Getting user input for years
today = date.today()
END_DATE = today.isoformat()
START_DATE = date(today.year - howmanyyears, today.month, today.day).isoformat()

whichstock = input("Which stock? > ") # <-- Getting user input for stock name
data = yf.download(whichstock, start=START_DATE, end=END_DATE)

data.reset_index(inplace=True)
data['Date'] = pd.to_datetime(data.Date) # <-- Inserting the 'Date' Feature

# Outputting the first 15 rows of data
print(data.head(15)) 
print(f"Data: {data.shape}")

In addition to this, I introduced two new features: the 50-day and 200-day Exponential Moving Averages (EMA). These additions helped me assess whether the stock market was leaning towards a Bearish or Bullish trend over specific periods, providing users with deeper insights into stock trends. Before proceeding with the Regression Model, I wanted to visualize some of the data. I created plots comparing High vs. Low prices and charted the daily closing prices alongside the 50-day and 200-day EMAs.

data['EMA-50'] = data['Close'].ewm(span=50, adjust=False).mean()
data['EMA-200'] = data['Close'].ewm(span=200, adjust=False).mean()

Now, let’s generate some plots

# High vs Low Graph
plt.figure(figsize=(8, 4))
plt.plot(data['Low'], label="Low", color="indianred")
plt.plot(data['High'], label="High", color="mediumseagreen")
plt.ylabel('Price (in USD)')
plt.xlabel("Time")
plt.title(f"High vs Low of {stock_name}")
plt.tight_layout()
plt.legend()

# Exponential Moving Average Graph
plt.figure(figsize=(8, 4))
plt.plot(data['EMA-50'], label="EMA for 50 days")
plt.plot(data['EMA-200'], label="EMA for 200 days")
plt.plot(data['Adj Close'], label="Close")
plt.title(f'Exponential Moving Average for {stock_name}')
plt.ylabel('Price (in USD)')
plt.xlabel("Time")
plt.legend()
plt.tight_layout()

High vs Low of GOOG

Exponential Moving Average for GOOG

After exploring the dataset, I moved on to building the Linear Regression Model. The main objective of the project was to predict the stock’s Closing Price, so I set it as the target variable (X component). The remaining features were used as input variables (Y components) for the model.

x = data[['Open', 'High', 'Low', 'Volume', 'EMA-50', 'EMA-200']]
y = data['Close']

Next, I used Scikit-learn’s train_test_split function to divide the data into two parts: 80% for training and 20% for testing.

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

With the data split, I proceeded to fit the Linear Regression Model and make predictions on future stock prices.

lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
pred = lr_model.predict(X_test)

To assess the accuracy of the model, I plotted a graph comparing the model’s predicted values against the actual values.

Real Values VS Predicted Values

Additionally, I printed the Real vs. Predicted prices for the stock on a selection of random days. This approach provided a clear comparison of how closely the predicted values matched the actual prices, helping to evaluate whether the model was performing as expected.

d=pd.DataFrame({'Actual_Price': y_test, 'Predicted_Price': pred})

print(d.head(10))
print(d.describe())

Once the model was complete, the final step was to predict the closing price using the various features. One key relationship that stood out was between Volume and Closing Price, where the model performed exceptionally well, predicting values with minimal error.

Predicted VS Actual Closing Price Based on Volume

To thoroughly evaluate the model’s performance, I examined key statistics, including the r² score, mean absolute error, and mean squared error. Below are the values for each metric.

Results

Building A Stock Prediction Software With Python

🚀 Your Algo Edge Just Leveled Up — Premium Plans Are Here!🚀

Now, let’s generate some plots

Keep Reading

AlgoEdge Insights