BOX OFFICE INTELLIGENCE
Movie Details
68
52
60
🎥

Awaiting Prediction

Fill in movie details and click
Predict Box Office to see the AI analysis.

Prediction Result
Worldwide Gross (Predicted)
$M
ROI: —
Return on Budget
Est. Profit ($M)
Confidence
Classification Probabilities
🟢 Hit
🟡 Average
🔴 Flop
Key Revenue Drivers
Plot Sentiment
Emotional tone of overview
NegativeNeutralPositive
Revenue Breakdown
4,803
Total Movies
$39.8M
Avg Budget
$68.2M
Avg Revenue
1.71×
Avg ROI
Revenue Distribution
Performance Label Split
Average Revenue by Release Month (Seasonality)
Genre Performance
#GenreAvg Revenue ($M)TrendHit Rate
Regression — Revenue Prediction
0.7340
R² Score
Variance explained by model
$48.2M
RMSE
Root Mean Squared Error
$29.7M
MAE
Mean Absolute Error
Classification — Hit / Average / Flop
0.6820
Accuracy
Overall correct predictions
0.6740
F1 Score
Weighted F1 across classes
0.6810
Precision
Weighted precision score
AutoML Model Leaderboard (PyCaret)
RankModelRMSEMAEStatus
🥇 1XGBoost0.73400.48200.3510BEST
2LightGBM0.72100.49500.3640SELECTED
3Gradient Boosting0.70800.51100.3780
4Random Forest0.69200.52900.3920
5Ridge Regression0.63100.59400.4510
Predicted vs Actual Revenue
Residuals Distribution
Feature Importance (Top 15)
Genre Revenue Multipliers
Seasonal Revenue Index
Classification Thresholds
HIT
Revenue ≥ 2× Production Budget
AVERAGE
1× ≤ Revenue < 2× Budget
FLOP
Revenue < Production Budget
Quick Start
1. Download TMDB 5000 dataset from Kaggle
2. Place CSVs in backend/data/raw/
3. Run training pipeline
4. Start the API server
pip install -r requirements.txt python backend/train_model.py uvicorn backend.api:app --reload # Frontend: open frontend/index.html
Tech Stack
FastAPIREST API backend
scikit-learnML pipelines
XGBoost / LightGBMGradient boosting models
PyCaretAutoML model selection
NLTK / VADERSentiment analysis
SHAPModel explainability
Pandas / NumPyData wrangling
Chart.jsFrontend visualizations
TMDB 5000Training dataset (Kaggle)
Feature Categories
Financial — budget, log_budget, budget_per_minute
Temporal — month, year, is_summer, is_holiday
Content — runtime, 20 genre flags, genre_count
Popularity — tmdb_popularity, vote_average, vote_count
Cast/Crew — cast_popularity, director_popularity
NLP — sentiment, pos/neg ratio, action/comedy scores