CinePredict — Box Office Intelligence

Movie Details

Movie Title

Budget ($M)

Runtime (min)

Release Month

Release Year

Genres

Cast Popularity (avg top 3)

68

Director Reputation

52

TMDB Popularity Score

60

Plot Overview (NLP + Sentiment)

🎥

Awaiting Prediction

Fill in movie details and click
Predict Box Office to see the AI analysis.

Prediction Result

Worldwide Gross (Predicted)

$—M

—

ROI: —

—

Return on Budget

—

Est. Profit ($M)

—

Confidence

Classification Probabilities

🟢 Hit —

🟡 Average —

🔴 Flop —

Key Revenue Drivers

Plot Sentiment

Emotional tone of overview —

NegativeNeutralPositive

Revenue Breakdown

4,803

Total Movies

$39.8M

Avg Budget

$68.2M

Avg Revenue

1.71×

Avg ROI

Revenue Distribution

Performance Label Split

Average Revenue by Release Month (Seasonality)

Genre Performance

#	Genre	Avg Revenue ($M)	Trend	Hit Rate

Regression — Revenue Prediction

0.7340

R² Score

Variance explained by model

$48.2M

RMSE

Root Mean Squared Error

$29.7M

MAE

Mean Absolute Error

Classification — Hit / Average / Flop

0.6820

Accuracy

Overall correct predictions

0.6740

F1 Score

Weighted F1 across classes

0.6810

Precision

Weighted precision score

AutoML Model Leaderboard (PyCaret)

Rank	Model	R²	RMSE	MAE	Status
🥇 1	XGBoost	0.7340	0.4820	0.3510	BEST
2	LightGBM	0.7210	0.4950	0.3640	SELECTED
3	Gradient Boosting	0.7080	0.5110	0.3780	—
4	Random Forest	0.6920	0.5290	0.3920	—
5	Ridge Regression	0.6310	0.5940	0.4510	—

Predicted vs Actual Revenue

Residuals Distribution

Feature Importance (Top 15)

Genre Revenue Multipliers

Seasonal Revenue Index

Classification Thresholds

HIT

Revenue ≥ 2× Production Budget

AVERAGE

1× ≤ Revenue < 2× Budget

FLOP

Revenue < Production Budget

Quick Start

1. Download TMDB 5000 dataset from Kaggle
2. Place CSVs in backend/data/raw/
3. Run training pipeline
4. Start the API server

pip install -r requirements.txt
python backend/train_model.py
uvicorn backend.api:app --reload
# Frontend: open frontend/index.html

Tech Stack

FastAPIREST API backend

scikit-learnML pipelines

XGBoost / LightGBMGradient boosting models

PyCaretAutoML model selection

NLTK / VADERSentiment analysis

SHAPModel explainability

Pandas / NumPyData wrangling

Chart.jsFrontend visualizations

TMDB 5000Training dataset (Kaggle)

Feature Categories

Financial — budget, log_budget, budget_per_minute
Temporal — month, year, is_summer, is_holiday
Content — runtime, 20 genre flags, genre_count
Popularity — tmdb_popularity, vote_average, vote_count
Cast/Crew — cast_popularity, director_popularity
NLP — sentiment, pos/neg ratio, action/comedy scores

BOX OFFICE PREDICTOR

Awaiting Prediction

DATA EXPLORER

MODEL PERFORMANCE

FEATURE INSIGHTS

ABOUT