π§
Multi-Class ANN
β train_model.py Β· build_model() Β· /api/career
A 4-layer feedforward ANN classifies a 30-dimensional skill vector into one of 14 career paths. The output layer has 14 neurons β one per career β and Softmax converts them to probabilities. This is the same architecture as ImageNet image classifiers.
Input(30) β Dense(128,ReLU) + BN + Drop(0.3)
β Dense(64,ReLU) + BN + Drop(0.2)
β Dense(32,ReLU)
β Dense(14, Softmax)
# 15,534 trainable params
π
Cosine Similarity
β app.py Β· cosine_similarity() Β· all routes
Measures the angle between two skill sets used as vectors. Used alongside ANN probability for career matching, and as primary engine for Interview IQ. cos(ΞΈ)=1 means perfect match. Same math as Word2Vec and BERT semantic search.
cos(ΞΈ) = (AΒ·B) / (|A| Γ |B|)
= |intersection| / (β|A| Γ β|B|)
# 0 = no match, 1 = perfect match
def cosine_similarity(a, b):
inter = len(set(a) & set(b))
mag = len(a)**0.5 * len(b)**0.5
return inter / mag
β‘
ReLU Activation
β train_model.py Β· all hidden layers
ReLU f(x)=max(0,x) introduces non-linearity. Without it, stacking Dense layers collapses to one linear equation β useless for complex patterns. ReLU also prevents vanishing gradients, keeping gradients at exactly 1 for positive inputs so deep networks can train.
def relu(x): return max(0, x)
# Keras: layers.Activation('relu')
# 3 hidden layers use ReLU
# Output layer uses Softmax instead
# Used in JS too: relu(score * weight)
π²
Softmax Output
β train_model.py output Β· app.py career route
Converts raw ANN logits to a probability distribution summing to 1. The career with the highest softmax output = predicted career. The match % shown is the softmax confidence score. Used identically in all multi-class classifiers.
softmax(xi) = exp(xi) / Ξ£ exp(xj)
# All outputs β [0,1], sum = 1.0
# In app.py:
def softmax(scores):
exps = [exp(s * 3) for s in scores]
total = sum(exps)
return [e / total for e in exps]
π‘οΈ
Dropout Regularisation
β train_model.py Β· Dropout(0.3/0.2)
Randomly zeros 30%/20% of neurons per training batch. Forces the network to learn redundant distributed representations β cannot memorise training patterns. Without dropout, the ANN overfits to our synthetic dataset and gives wrong predictions on real user skills.
layers.Dropout(0.30) # 30% zeroed layer 1
layers.Dropout(0.20) # 20% zeroed layer 2
# Inverted: remaining * 1/(1-p)
# OFF during model.predict() inference
π
Batch Normalisation
β train_model.py Β· BatchNormalization() Γ 2
Normalises layer outputs to meanβ0, stdβ1 within each mini-batch. Stabilises training and allows Adam to use lr=0.001 safely. Without BN, our ANN would need careful lr tuning and would train 3β5Γ slower on the career classification task.
x = layers.Dense(128)(inp)
x = layers.BatchNormalization() # ΞΌβ0, ΟΒ²β1
x = layers.Activation('relu')
x = layers.Dropout(0.3)
# Ξ³, Ξ² are learnable per-feature params
π
TF-IDF (Interview IQ)
β app.py Β· tfidf_weight() Β· /api/interview
Weights each JD keyword by TF (how often it appears in this JD) Γ IDF (how rare it is across all JDs). This surfaces what the job truly requires vs. generic filler. A skill mentioned 4Γ in a niche JD scores higher than a common skill mentioned once.
TF(w,d) = count(w in doc) / total_words
IDF(w) = log(N / docs_with_w) + 1
TF-IDF = TF Γ IDF
# Converts raw JD text β weighted vector
# Then cosine_sim(jd_vec, resume_vec)
# = readiness score per topic
π―
Adam Optimizer
β train_model.py Β· Adam(lr=0.001)
Adam = Momentum + RMSProp. Each weight gets its own adaptive learning rate. Momentum stores gradient direction, RMSProp scales by recent gradient magnitude. Converges 5β10Γ faster than vanilla SGD on the career classification task.
m = Ξ²βΒ·m + (1-Ξ²β)Β·g # momentum
v = Ξ²βΒ·v + (1-Ξ²β)Β·gΒ² # RMSProp
w = w - lrΒ·mΜ/(βvΜ + Ξ΅) # update
# Ξ²β=0.9, Ξ²β=0.999, lr=0.001, Ξ΅=1e-7
π
Cross-Entropy Loss
β train_model.py Β· compile(loss=...)
Sparse Categorical Cross-Entropy is the loss for multi-class classification. Measures how wrong the ANN's probability distribution is vs. the true career label. Minimised via backpropagation over 150 epochs. L = βlog(predicted_prob_of_true_class).
model.compile(
loss='sparse_categorical_crossentropy',
optimizer='adam', metrics=['accuracy']
)
# 'sparse' = integer labels, not one-hot
# Perfect prediction: L β 0
# Wrong prediction: L β β
π’
MinMax Normalisation
β train_model.py Β· MinMaxScaler + app.py
Scales all features to [0,1] before ANN input. Without this, skill count 25 dominates velocity score 0.8 in gradient descent. Critical rule: fit scaler ONLY on training data. Applying it to test data before fitting = data leakage = inflated accuracy.
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# fit_transform: learns min/max from TRAIN
# transform: uses SAME params on TEST
# NEVER fit on test = data leakage