Ramieeee's IT blog

[Paper Review] Development of a machine learning approach for prediction of red blood cell transfusion in patients undergoing Cesarean section at a single institution

Ramhee Yeon — Sat, 04 Apr 2026 14:56:08 GMT

The current interest of mine is leaned towards the AI research applied to biomedical or healthcare domains. While scanning through papers, I spotted this paper and I decided to have a look at it. Indeed this was intriguing, and worth sparing time to read.

It briefly talks about the ML models trained to predict whether the parturients will require blood transfusion while undergoing cesarean section, published in 2024 on Nature.

Introduction

The amount of the blood loss during cesarian section and surgery is large and the transfusion requirement is not occasional, but a mendatory.

The good amount of blood preparation could save time cost intraoperatively. Blood is a scarce medical resource and it has to be significantly prepared with right amount so not to waste before surgery, thus accurately predicting the amount of blood is always demanded.

This paper articulates the data preparation and model comparison metrics and gives insight on which data to leverage for training ML models.

This paper mostly focuses on red blood cell excluding other blood products. The primary aim by the paper is to select the right ML model with the best performance in predicting the need for an intraoperative Red Blood Cell(RBC) transfusion during a CS.

Methods

The data could be divided into two parts: demographic data of the patient and perioperative data.

Data
- Data size
  - total: 16,137
  - used: 14,254 after excluding non-complete data
  - RBC transfusion during surgery: 1,020 patients (7.16% of the total)
  - data split: 6:2:2 for training, validation and test
- the most recent data values within two days prior to surgery
  - demographic data
    - age, weight, height etc.
    - placenta previa totalis/partialis/marginalis
  - perioperative data
    - anesthesia, midazolam use, RBC transfusion

Data ratio for classification was particularly interesting to see. It is uncommon to apply imbalanced ratio to the number of ground truths. This paper compared by applying 1:1, 1:2..., 1:4 even.

ML models
- XGBoost, KNN, DT, SVM, MLP, LR, RF, DNN
Model assessment
- AUROC
- AUPRC
- metrics
  - accuracy, recall, precision, F1

Results

In accord with the paper, XGBoost most excel in the prediction for the blood transfusion with the score of AUROC 0.82 and Accuracy 0.94.

This figure compares ROC and PRC between each model.

But when I look at PRC curve, the recall values are too low that it the graph forms weird curves. Also the table above indicates that the F1 score is not greater than 0.5.

The figure below shows all ROC and PRC curves for different models with different dataset ratio.

Discussion

1:1 ratio dataset did not improve the performance of the model.
- imbalanced dataset was better performed
Traditional modeling can lead to degradation of performance where as ML aims for broader generalization and is adequate in this case
Limits
- single-center study
  - heterogeneous dataset could help generalizing the blood transfusion metrics which could enhance the model performance in general
- needs more data balancing techniques for model training
  - by only selecting 1:1, 1:2..., 1:4 dataset ratio has a certain limit. This could confuse the model when the dataset is well-refined and well-polished.

Reflection

The research gave me a great range of insight when understanding medical data and how to apply models to it. However there were several moments when I felt the paper could be better in quality and have the models that could be utilized in the actual CR.

I would like to list a few things that I think will improve the performance of the models and settle down some limitations the paper mentioned.

Recall score is extremely low, only precision is high
- Well, I think this come from the data balance. The balance could be adjusted or the data itself could be normalized or preprocessed in advance. Raw data has a high risk as some features will carry a greater parameter for the model and this will affect the performance. The range generalization or trimming unnecessary data could help too.
- The result indicates that the performance of XGBoost model excels but the recall score tells that the model is no use, even though the precision and accuracy metrics are high. This indicates that the model is highly biased towards predicting True Negative class as the best model trained with data ratio of True and False was not 1:1. This could imply that True Positive rate is extremely low, meaning when the parturients actually need blood transfusion, the model could say they will not need transfusion, which will increase the clinical risk.
data split could be 8:1:1 to focus more on the training
- The data lacked in the size. Only a 1-2 thousands of data rows could be considered to be used with 50:50 ratio for model training. To generalize the prediction performance with the scarce data could cause a highly biased result. If the dataset size is small, the training data rows should be more than 60%.

Reference

Lee, S. W., Park, B., Seo, J., Lee, S., & Sim, J. H. (2024). Development of a machine learning approach for prediction of red blood cell transfusion in patients undergoing Cesarean section at a single institution. Scientific Reports, 14, 16628. https://doi.org/10.1038/s41598-024-67784-2

[Paper Review] MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

Ramhee Yeon — Sun, 08 Feb 2026 14:04:05 GMT

I have been looking for a method that will fulfill tasks for extracting data from a long-sequenced and unstructured text using LLM.

If given a pdf file of a research paper, my first approach was to iterate each pages and feed the text data from a page to an agent. But an agent is stateless, meaning it does not have any information of the previous page and will cause data loss as some information is divided.

I, then, came up with an idea of a shared-memory as a state to be utilized by each agent in every step.

This paper’s goal is to enable the agents to perform a better reasoning and inference, also to reduce the inference time with less memory utilization.

1. Introduction

Problem arisen from a traditional long context data processing with LLM
- full-context prompting, appending all past turns regardless of their relevance
- Growing inference cost and memory usage
- Generalization limits beyond the training horizon
- Overloaded and inefficient context
Solution
- a model to learn to consolidate its memory as part of its reasoning process
- memory to be shared by agents

2. MEM1

Annotate each component using XML-style tags
- for internal state (reasoning)
  - summarizes past information
  - reasons about subsequent actions
- for environment queries
- for the agent’s responses
- for external observations or tool outputs

The process indicates that the $IS_{t-1}$, $Query_{t-1}$, $Info_{t-1}$ are processed to be given to $IS_{t}$. In every step this process happens to get rid of the unnecessary data, which may affect the inference performance and data quality.

3. Experiment & Results

Interestingly, MEM1 approach showed the two key results

better at inference time (less than others)
better in match count

Reflection

In our research the entire MEM1 process is an excess approach, and also is a slightly different topic. However, it is notable that the paper represents the shared-memory technique to safely toss the data to the next agent and cast aside the incorrect data.

I will adapt the shared-memory in our pipeline, and the draft of the pipeline will look something like this.

It may not state everything in a accurate way but seems at least feasible to be applied to our research.

Also, when we consider training a model in our specific domain (Cognitive Reserve), I asked to myself, “Can we train in such a way?”, and came to a conclusion that I cannot as CR needs a specific dataset and cannot be trained and generalised.

Reference

[1] Zhou, Z., Qu, A., Wu, Z., Kim, S., Prakash, A., Rus, D., ... & Liang, P. P. (2025). MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents. arXiv preprint arXiv:2506.15841.

[Paper Review] Revolutionizing Speaker Recognition and Diarization: A Novel Methodology in Speech Analysis

Ramhee Yeon — Sat, 05 Jul 2025 08:52:12 GMT

It is my first paper review about Speech-to-Text methodology. After reading this paper I tried to dig into the sound and wave world to learn how the digitalized sound data is transformed into the form that humans can understand.

My company is now working on the Speech-to-Text skills and trying to catch up the state-of-the-art techniques to provide with the better quality services. Yet, there has been a lack of research and the stout background of STT knowledge, thus I picked several research papers including this paper. We need diarization as well to be able to specify the speaker as well as transcription. This paper helped in such way to understand how the architecture and algorithms should look like prior to building such service.

1. Abstract

The paper aims to fulfill Speech-to-Text tasks with speaker recognition, so called diarization leveraging Whisper model for transcription, ECAPA-TDNN for speaker embeddings, Agglomerative Hierarchical Clustering for speaker clustering to identify who spoke when.

2. Introduction

why the paper do this research?

in modern society meeting transcription and audio processing are important. Yet the accuracy in audio processing needs improvement to be further developed to reach the service level.

objects
- identification and segmentation of speakers
- content comprehension
with what?
- speaker embeddings
  - various acoustic features within speech
  - discern between speakers

Models and Packages

In this paper the research was conducted using Whisper model for transcription, Pyannote algorithm for speaker embeddings and Agglomerative Hierarchical Clustering for grouping similar embeddings. The key features in short are listed as below.

Whisper

Whisper has been trained with multilingual supervised dataset to be able to differentiate languages, linguistic nuance and accents. It supports with many languages. Details to be mentioned later below.

Pyannote

It is another model to extract and manipulate speaker embeddings. It encapsulates the unique sound characteristics of speakers. It is the key feature to diarize.

Agglomerative Hierarchical Clustering

The embedded data will now then grouped into clusters using this algorithm, then it could be labeled.

in summary

whisper: transcription of audio data
pyannote: extract embeddings from acoustic features
Agglomerative hierarchical clustering: unveil relationship between speakers

RNN and LSTM algorithm were utilized for Diarization research to understand the sequential features from the audio. However, there was a limitation in long sequential data. Also, CNN was used in such matter. It excelled in hierarchical spatial features and pattern extraction from spectrogram but had also a limit for variable data length (input size). After sometime when Transformers was released to the world, it was adapted for diarzation researches and applications as it performed well in many tasks.

Embeddings

To read the features for diarization, many researches were conducted. X-vectors is one of them and is introduced as the state-of-the-art algorithm (time delay neural network, so called TDNN) for speaker verification

Proposal

to overcome the challenges and limitation stated above, this paper proposes the methodology of Emphasized Channel Attention, Propagation, and Aggregation (ECAPA-TDNN)
- ECAPA-TDNN
  - an advanced iteration of TDNN
  - it uses
    - attention mechanism
    - multilayer feature aggregation (MFA)
    - squeeze excitation modules
    - residual blocks

k-means

efficient on large dataset, but has limitation when a speaker is dominant, but agglomerative hierarchical clustering can control such imbalances

4. Methodology

Whisper

The notable information about Whisper model is that, it is built with the transformer architecture with an encoder a decoder. It supports 99 languages with the word error rate (WER) of 4.2%. Korean has a lot more of a complexity in measuring the error rate and it cannot be measured by the WER as the spacing and letter combination system differ. Instead, character error rate is adopted to check the performance.

4.2% WER
99 languages
680,000h audio (online platform)
- 563,000h english
- 117,000h other languages
robustness against accents, ambient disturbances

currently has large-v3 and turbo model
utilizes encoder-decoder transformer
- encoder: derives a latent representation from speech
- decoder: generates text, based on the latent representation

Other Speech-to-Text models

Suggested algorithm

Audio file is processed and handled in 16kHz PCM format scaled in a range of -1 to 1 normalized. Frequency is then converted using 80 channel Mel Spectrogram as 80 channel is most common and accepted by experience.

Encoder, decoder

conversion involves
- window size: 25ms
- stride: 10ms
- segments: 30s
encoder operates per 30-second segment, to extract features
- it involves two GELU activated convolutions
  - filter size of 3 for input embeddings
- position embedding uses Sine function
  - performed by transformer
decoder
- calculates probability based on the latent representation
- token determination via Greedy Search or Beak Search
- output: maximum 224 tokens per 30-second segment

Process in short,

transcription
- spoken content (Whisper)
speaker embeddings
- speaker embeddings from audio (unique features of individuals)
  - bases for analysis
clustering
- clustering using Agglomerative hierarchical clustering method based on similarity
output
- will be able to see who spoke when
  - Whisper model’s output has time information with transcription
  - audio will be cut according to the time information and determines the speaker

Dataset

Paper used VoxCeleb1 and 2 dataset. This dataset was collected from Youtube of the celebrities. VoxCeleb dataset is the best usecase for many researches as it contains different voices from clear voices to voices with noise in the background.

Reflection

It was indeed intriguing and was informative for me to kick off in the STT field. However, what was lacking was that it gave less information about how the digitalized sound information was transformed into Mel-Spectrogram and data process. Overall the explanation about the process was a bit of a let-down.

Secondly, the paper shows the result data with two model comparison, yet it did not give the comparison data of train and test data error rate so to tell if the model was overfitted.

The algorithm of ECAPA-TDNN will be a good choice for diarization if the number of speaker is fixed in every situation. But in real life there is always a exception where in the meeting, another member comes in and the existing member could be out from the meeting. I personally think that the embeddings and clustering should consider such situation that the embedding information could change in any situation.

With such realization, next paper to read will be the very standard paper written by Oxford researchers who established VoxCeleb dataset. Or something else if there wil be any interesting paper

Reference

[1] R. D. Shankar, R. B. Manjula, and R. C. Biradar, "Revolutionizing Speaker Recognition and Diarization: A Novel Methodology in Speech Analysis," SN Computer Science, vol. 6, no. 87, 2025. https://doi.org/10.1007/s42979-024-03509-6

[Paper Review] Training a Helpful and Harmless Assistant withReinforcement Learning from Human Feedback

Ramhee Yeon — Sun, 25 May 2025 08:04:45 GMT

Since I have joined a team which deals with AI and LLMs, I have decided to review a paper in relation to an LLM which deals with reinforcement learning of LLM and how it turns out to be better than the zero-shot learning.

It had been only 3 days in the team since I joined the team, but I myself needed to figure out my pathway with which skills I will develop and carry.

There are tons and loads of AI domains I would like to get involved in, like algorithms making or linear regression research or LLMs, but since our team currently focuses on LLM and fine-tuning it, I decided to further study how to evaluate the LLM outputs and reinforcement-learn it.

OpenAI team Anthropic conducted a research on Reinforcement Learning from Human Feedback(RLHF) and it was intriguing.

1. Introduction

Learning from Human Feedback would mean nothing much difficult to understand. There are two datasets sorted by humans, and a model is fine-tuned.

Datasets

Helpfulness
- answering, writing, editing, documents etc.
Harmlessness
- not related to harmful goals like bank rubbery

These two datasets will be categorised by humans and it is totally up to the crowdworkers to decide which category the text falls into.

Trade-off

An interesting result though, when the model learns from a single dataset, which is either the helpful dataset or the harmless dataset, it has got a tendency to show the trade-offs in the scores.

As shown above, the green triangle plot(Online Helpful RLHF) scores top with Elo score on the left, yet on the right it is least preferred by crowdworkers when scoring for harmlessness. Technically, a bias is formed when only a single dataset is used for training the model.

When trained with the two datasets, both helpful and harmless datasets, it shows a meaningful result that the few-shot accuracy shows better performance than the zero-shot accuracy in general NLP performance tests.

The graphs indicate that the bigger models the better performance the models show in general.

2. Suggested RLHF

This is the process of RLHF from data collection to reinforcement learning. It was somehow complicated for me to understand the whole, so I had to summarise for my own the entire process according to my understanding. There are approximately three steps in a nutshell as below.

1) AB test

crowdworker determines the outputs.

2) Preference Model

input: prompt, AB answers, preferred answer(human feedback)
output: scores

3) RLHF

input: prompt, RLHF answer, PM score

3. Evaluation

The standard NLP test methods are applied to check the model, but as mentioned above, when the models are fine-tuned with the datasets (helpful and harmless), the bias is formed.

Test methods are as below.

MMMLU: Benchmark for many domains with high-level questions (history, law, medicine, etc.)
LAMBDA: A task to predict the last word
HellaSwag: A task to choose an appropriate context
OpenBookQA: Basic science knowledge
ARC-Easy: Basic science knowledge (easy questions)
ARC-Challenge: higher-level questions which requires the reasoning
TriviaQA: common sense questions collected on the internet

4. Reflection

OpenAI team indicated that the model shows better performance in helpful scores, when evaluated by humans, but they are uncertain of the reason why it is like this. They assume it is because of the datasets, but further research is required to prove the datasets lack the correctness for fine-tuning.

Evaluation

It was always questioning me when I have to consider how to assess the model in a specific domain when an LLM is applied. Should a human give a feedback to the answers the LLM has given? Well, this paper at least had provided the information that humans (crowdworkers) did the job to divide datasets. From this research I could get the idea that human feedback is required and essential in output evaluation, although it is time consuming and requires budgets, because human force is the most expensive resource though.

Datasets

Dataset from the range of 100-500k will be ideal for fine-tuning, I though. The paper suggests the static datasets and online datasets, and the paper was released in 2022, which I think there could be better ideas and methods on how to make datasets using LLMs and evaluate them.

5. Reference

[1] Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., et al. (2022). Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv preprint arXiv:2204.05862.

[Paper Review] Object Recognition and Positioning with Neural Networks: Single Ultrasonic Sensor Scanning Approach

Ramhee Yeon — Sun, 27 Apr 2025 10:59:41 GMT

It is not all of a sudden that I was intrigued by the subject Ultrasonic scanning to detect objects or classify the objects. I once randomly thought about what I would choose for PhD paper if I had to go through PhD course, then ultrasound was the key idea that popped into my head. To strengthen the idea I was on the act to search the webs for papers then I found this. My idea about ultrasound was this.

has to be able to detect objects
reads the environment and could tell it by the signal
related to scanning the environment and able to visualize it
low cost

Of course low cost for my personal project but it would be a different story if I were a part of the university in PhD course, which I am not, thus the cost matters.

I had a thorough review on this paper written by Turkish people, and according to this paper the authors belong to Defense Industries Research and Development Institute of Turkey, which gave me a belief that this paper would be meticulous and analytic.

1. Introduction

Why Ultrasonic sensor?

Low cost compared to LIDAR

The most and critical reason for utilizing ultrasonic sensing is because of the cost. Compared to ultrasonic sensor, other sensors are quite heavy in terms of price, LIDAR for example is a great sensor with a stunning visualization and detecting objects around, yet is high in price and individuals have impediments to access to this.
Useful when optical sensing is not possible

LIDAR, again, has the strength when the light can travel to objects. However, when optical sensing is disturbed by conditions such as weather or other factors, it may not be able to collect information from objects as there will be noise and the sensing tasks will not be fulfilled completely.

Aim

The paper aims to contribute to provide with the methodologies of 1) object classification and 2) coordinate estimation with a single ultrasonic sensor, so to be utilized on robotic or human applications such as helmets scanning objects in real time

2. Proposed methods

Data Collection

The authors placed the sensor on 3D printer for automated data collection. Every 2mm in X direction of the 3D printer, the data will be collected through ultrasonic sensor with 116 different scenarios.

Dataset

3 objects with 3 classes were to be used for dataset: large object (40mm), medium object (20mm) and narrow object (10 mm)

The process of the ultrasound sensor is as below.

Transmitter sends the signal to objects
Receiver takes the signal and process the signal through amplifier
Digitalize the signal
Send digital data to USB

Algorithm

CNN algorithm was applied to process the signal information. This was not a multi-modal model. Classic CNN model consists of 3 of 2D convolutional layers and 3 of max pooling layers, then flatten layer. Dense layer will take the vector feature and softmax will classify objects, and another dense layer will solve the linear regression for coordinates estimation.

Preprocessing

Data was collected by placing the objects at certain position in y-axis, and random between 0 - 40 cm in x-axis and measure every 2mm with ultrasonic sensor. 5mm is too much and miss out some information. The data then will look like a fluctuation on 2D graph where the object reflection is detected. Then envelope of the signal will be extracted from the raw data. When the two features are combined, a pillar-looking data shows on graph, seemingly very disheveled with a lot of noise.

Normalization

There are two normalization process.

0-255 to 0-1 scale

This is necessary on image preprocessing so the model will take inputs with values from 0 to 1 and will not have big values to save memory of the machine as well as to prevent from overfitting as the value could grow intimidatingly when MSE is applied for error calculation.
Change input size and and change to grayscale

Size of the picture should be in a certain format such as 12 x 12 scale or other so the model will take the input accordingly. And also grayscale was applied to this algorithm.

3. Key Point

If divided into big categories, they will be as below.

Data collection
Data processing
CNN to determine values

It focuses on CNN algorithm to perform multiple tasks and this differentiates the paper from other researches. Traditional CNN could perform such tasks with ultrasonic image data.

4. Reflection

The paper indeed is outstanding in details of how the data was prepared and the processing was done. It may be a barrier though, that if there is no knowledge about the sound waves, or graphs related to waves, and also the devices used for wave detection, this would be quite a handful paper to review. The paper introduces the modules and models of devices. Various kinds of these were new to me as I am not familiar with the sound wave or anything related to this domain.

I would need to see many studies of ultrasonic about environmental conditions like roads, metals or materials

Improvements

Score

The model itself was great in detecting objects and estimating the coordinates of objects. However, multiple object detection F1-score was unexpectedly low, although the first object showed over 90%. From the second object to multiple objects, the score would significantly decrease that some would score 79% in F1-score. The score could be said the model is adequately reliable with such performance, but also the score shows that there is a handful amount of error that this may not be appropriate for adaption in industries.
Number of Sensor

The aim of the paper is on the single-sensor algorithm. This was brilliant enough to show the possibility of object detection using only one ultrasonic sensor. However it has limitation when addressing many objects with a single sensor. As mentioned in the paper when Gaussian noise and salt-and-pepper noise were applied to the test dataset for real-world applications, the score would drop up to 70% for the third object. I strongly believe that using more sensors to this research will improve the quality of image processing and sensing data from the sensors.
Test method

I noticed that the test was conducted by placing objects at a certain y-axis, which is not at random points. This could be the limitation of the suggested method. I would be more reliable if the test was conducted with various position for x, y axis.

5. Reference

[1] Karagoz A., Dindis G., Object Recognition and Positioning with Neural Networks: Single Ultrasonic Sensor Scanning Approach, Sensors 2025, 25(4), 1086, https://doi.org/10.3390/s25041086 (CC BY 4.0)

[Paper Review] DeepSeek-R1 Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Ramhee Yeon — Sun, 20 Apr 2025 12:11:18 GMT

It has not been a long time since DeepSeek was released. It was indeed a shock to those who are in AI industry.

I was not familiar with LLM’s algorithm and the computing resource usage of the LLMs. All I was doing was to utilise the LLM APIs for developers to build pipelines for automation. Other than that, nothing much was to consider.

When DeepSeek stroke the LLM industry, people eagerly talked about the algorithm and how light the model is compared to the other models such as GPT from OpenAI and Llama from Meta.

I would like to meticulously analyze DeepSeek, especially how it was designed to distill with open source models and how they used dataset to train the model.

1. Abstract

In the paper, two reasoning models are introduced: DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero seemed to be a model before supervised fine-tuning. In the abstract , two models are summarised with features below.

DeepSeek-R1-Zero
- without suvervised fine-tunining(SFT)
- pros
  - remarkable reasoning capabilities
  - naturally emerges with numerous powerful and intriguing reasoning behaviors
- cons
  - poor readability (human cannot understand the output)
  - poor language mixing
DeepSeek-R1
- developed to overcome the shortcomings of DeepSeek-R1-Zero
- incorporates multi-stage training and cold-start data before RL.
- comparable to OpenAI-o1-1217

According to the paper, the benchmark performance of DeepSeek-R1 surpasses OpenAI-o1-mini and show a similar performance with OpenAI-o1-1217. The performance metrics are as below in the table.

	AIME2024	Codeforces	GPQA Diamonds	MATH-500	MMLU	SWE-bench Verified
DeepSeek-R1	79.8	96.3	71.5	93.7	90.8	49.2
OpenAI-o1-1217	79.2	96.6	75.7	96.4	91.8	48.9
DeepSeek-R1-32B	72.6	90.6	62.1	94.3	87.4	36.8
OpenAI-o1-mini	63.6	93.4	60.0	90.0	85.2	41.6
DeepSeek-V3	39.2	58.7	59.1	90.2	88.5	42.0

2. Introduction

In the paper, it kicks off by mentioning that OpenAI o1 was the first to be introduced for reasoning tasks but the effective test-time scaling is still challenged, and other works were not comparable to o1 models.

Why was DeepSeek R1 model was developed?
- To improve language model reasoning capabilities with RL(reinforcement learning)
How?
- DeepSeek-R1-Zero: DeepSeek-V3-Base's output pass through GRPO, to update parameters in accordance with the scores
- DeepSeek-R1: RL and cold-start data, and multi stage training pipeline

Basically, this paper emphasises on the fact that LLM could be optimised and incentivised through RL without supervised fine-tuning(SFT)

3. Approach

In chapter 2, DeepSeek team mentions the algorithm, Group Relative Policy Optimisation (GRPO). This algorithm is utilised to train the model by optimising outputs, which consists of new policy and old policy. Also this paper says critic model is forgone. It is the same size as the policy model, which I think is one of the factors that enables the DeepSeek-R1-Zero to be lighter by reducing a significant amount of computational process.

GRPO equation looks as below.

The gist of the algorithm is to sample outputs from old policy and calculate possibilities. The ratio of new policy’s output and old policy’s output is then clipped within the range of $ 1-\\epsilon, 1+\\epsilon $ to prevent from getting too much bias either on the new policy or the old policy.

Rewards

There are two reward methods to which the model resort. The model will take the rewards when the output scores meaningfully. It leverages the rewards to update its weight in a direction where the score is evaluated good, whereas the less score will be then refined or ablated so only the useful scores remain.

accuracy rewards: it evaluates the response like math problem results or classification results.
format rewards: it enforces the model to have its reasoning process in the tags like ''

How Test Was Done

AIME accuracy: For each question, 16 answers were selected and the overall average accuracy was calculated

4. Key Point

DeepSeek-R1-Zero model showed that it actually is autonomously learning. The more learning steps it takes, the more time it takes to response, which means it thinks more before responding.

By its output after GRPO calculation, the model learns itself with knowledge it spits and takes it back to strengthen what it is sure about. It even reaches to the point where there is an "aha" moment, a moment when it realises itself what the knowledge should be or meant to be.

Yet still, DeekSeek-R1-Zero's reasoning process has the drawback of poor readability. If humans cannot understand the outputs from its weights, it will be useless. This is the reason why DeepSeek-R1 is designed to perform with robust readability.

DeepSeek-R1?

DeepSeek-R1 model takes a few data to be supervised-fine-tuned with the sampling data from DeepSeek-R1-Zero's output. Human annotators would filter and sample the data to be utilised or not utilised for DeepSeek-R1 model to be enforced to have better readability and reasoning ability. Approximately 600k reasoning related training samples and 200k unreasoning training samples were collected to feed the model.

Process in Summary

The process in this paper for reinforcement learning of the models is summarised as below.

DeepSeek-V3-Base -> DeepSeek-R1-Zero -> DeepSeek-R1(with Zero's data for cold-start) -> DeepSeek-R1(with RL as Zero model did) -> Distillation

How Test Was Done

An intriguing part is the test part. There are many test methods for LLMs performance in unique domains like math, reasoning and others. The paper claims that 16 answers were selected and the overall average accuracy was calculated.

5. Reflection

This was my first LLM paper review. I felt awkward with the concepts of terms used in the paper like checkpoint, RL for LLM, rewards and so on. Quite a challenging moment it had been, and motivating to read the paper. It was also thrilling that the model could learn by itself by choosing rewards and taking the right output so it could develop its own universe of tensor. Once I had a doubt that the methodology of developing LLMs with vectors and tensors could be wrong and there are other ways to have a better performing models, but then realised that the numbers and floats in tensors are the best efficient way to store and calculate to reflect LLM and maybe it is by the nature meant to be this way. Number is the only way for now to express a data from a cell though.

Still there is a limitation that the model shows its robust part in English and Chinese since the base model was trained mainly with these two languages, but on the other hand, it was only the two languages it performed fancy. The paper mentions that the future work is to add more languages to the model so it is not restricted in only a few languages.

My personal interest in LLM, if there should be one, is making a sLLM, with low computational resource requirements, faster performance so it could be adopted in any circumstances whether it is on hardwares like embedding or any domains.

6. Reference

[1] DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., et al. (2024). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv preprint arXiv:2501.12948

[LeetCode] Top Interview 150 Problems Solving # 80 Remove Duplicates from Sorted Array II

Ramhee Yeon — Wed, 02 Apr 2025 04:31:47 GMT

Understanding the Problem

An integer list of nums has numbers that could be either duplicate or non-duplicate. It is to leave 1-2 values for each number in the list in-place, which means it is nothing to do with returning any value from the method.

# example 1
Input: nums = [1, 1, 1, 2, 2, 3, 3, 4]
Output: nums = [1, 1, 2, 2, 3, 3, 4] # it is not a returned value. Must fix nums list from the parameter

Approach

I had an experience of removing duplicates from the given list. It was pretty much the same though. But this time it allowed each unique element to have 2 duplicates.

I was going to have the unique values with set() method and iterate from this.

Solution

The problem solving steps are as below.

I get the unique values into unique_n with set(nums)
Then I would check each element with n
If n has more duplicates than 2, remove in the while loop

class Solution:
    def removeDuplicates(self, nums: List[int]) -> int:
        unique_n = set(nums)
        for n in unique_n:
            while nums.count(n) > 2:
                nums.remove(n)

Reflection

This solution is not the best I could tell as the time complexity scored 4017ms! An amusing runtime number I had not seen before. There are two loops for and while, which will be close to O(n²) at worst.

People had it done in a different way with double pointer.

class Solution:
    def removeDuplicates(self, nums: List[int]) -> int:

        k = 2

        for i in range(2, len(nums)):
            if nums[i] != nums[k - 2]:
                nums[k] = nums[i]
                k += 1 

        return k

With this solution k-2th and ith index values are compared. The code could be visualised step by step like this.

# step 1
nums = [1, 1, 1, 2, 2, 3]
 idx => 0  1  2  3  4  5
        i     k
# i is 0 and k is at 2
# nums[i] != nums[k - 2] this condition is False

# step 2
nums = [1, 1, 1, 2, 2, 3]
 idx => 0  1  2  3  4  5
           i  k
# i is 1 and k is at 2
# nums[i] != nums[k - 2] this condition is False

# step 3
nums = [1, 1, 1, 2, 2, 3]
 idx => 0  1  2  3  4  5
              i
              k
# i is 2 and k is at 2
# nums[i] != nums[k - 2] this condition is False

# step 4
nums = [1, 1, 1, 2, 2, 3]
 idx => 0  1  2  3  4  5
              k  i
# i is 3 and k is at 2
# nums[i] != nums[k - 2] this condition is True
# nums[k] is updated, and k is updated
nums = [1, 1, 2, 2, 2, 3]
 idx => 0  1  2  3  4  5
                 k  i

This process goes until the end. It is quite a tricky algorithm without visualisation. With double pointers the time complexity will be close to O(n).

[LeetCode] Top Interview 150 Problems Solving # 151 Reverse Words in a String

Ramhee Yeon — Fri, 28 Mar 2025 00:33:41 GMT

Understanding the Problem

A string is given with spaces. The string could be a single character or a complete sentence. It is to reverse the string by words. The spaces at the edges should be cut off.

# example 1
Input: s = "the sky is blue"
Output: "blue is sky the"

# example 2
Input: s = "  the sky is blue  "
Output: "blue is sky the"

Approach

I would first make it a list, the reverse it. Quite a simple problem it seems.

Solution

I made the string a list by split() so it will automatically remove all the spaces and make a word list. s_list below will have the value s_list = [“the”, ”sky”, ”is”, ”blue”]. Afterwards, I reversed the list with s_list[::-1]. This index sorting means it will be reversed with the value -1. In slicing index has got the sequence as [start:stop:step]. When the step has a negative value, this means in a reverse order, so in our case it was to select every elements by emptying start and stop, then reverse it with -1 to enumerate every element.

When the list is reversed, then I simply joined the list.

class Solution:
    def reverseWords(self, s: str) -> str:
        s_list = s.split()
        return " ".join(s_list[::-1])

Reflection

If I had no knowledge about the internal methods like split(), join() and slicing technique, this problem would’ve been a conundrum. Luckily my answer had the time complexity of 0ms.

[LeetCode] Top Interview 150 Problems Solving # 392 Is Subsequence

Ramhee Yeon — Wed, 26 Mar 2025 10:50:15 GMT

Understanding the Problem

Given strings s and t, it is to tell whether s is the subsequence to t. It is not finding if the characters in s exist in t. Well basically it is, but the sequence matters too.

# example
Input: s = "abc", t = "ahbgdc"
Output: true

Approach

This would require a for loop to check each character. The loop will go through t as it may or may not contain s.

Solution

I have declared idx and initialized it with 0, to start from the first index of s. If only c from the loop of t is identical to s[idx], add up one to idx to move to the next sequence of s.

But it needs checking if idx is at the end of s. If it is at the end of s, there is no further need for checking the characters anymore, thus just return True.

class Solution:
    def isSubsequence(self, s: str, t: str) -> bool:
        idx = 0
        for c in t:
            if idx >= len(s):
                return True
            elif c == s[idx]:
                idx += 1

        if idx == len(s):
            return True
        return False

Reflection

The fact that I was trying to solve this with sort() or set() was a bit hilarious, but it was worth trying to see many test cases that went wrong with my code. It had 0ms time complexity which I am satisfied with.

[LeetCode] Top Interview 150 Problems Solving # 242 Valid Anagram

Ramhee Yeon — Tue, 25 Mar 2025 05:39:12 GMT

Understanding the Problem

It is to find valid anagram. Anagram is a word, as for this problem, which can be rearranged to be formed another word. In this problem, s and t strings are given with characters each, and they can form exactly the same words or not. If s can be rearranged to be t, or t can be rearranged to be s, return True. If the characters in s and t are irrelevant, return False.

# example 1
Input: s = "anagram", t = "nagaram"
Output: True

# example 2
Input: s = "anagram", t = "annagrmm"
Output: False

Approach

I would solve the problem by simply sorting the two strings and comparing them.

Solution

class Solution:
    def isAnagram(self, s: str, t: str) -> bool:
        # 20ms runtime
        s_sorted = sorted(s)
        t_sorted = sorted(t)

        if s_sorted != t_sorted:
            return False
        return True

This was simple but it took 20ms runtime as the sorting logic used twice in the code was causing a sort of delayed runtime. So I would write down the code again to make it run faster.

class Solution:
    def isAnagram(self, s: str, t: str) -> bool:
        # 7ms runtime
        checked = []
        for c in s:
            if c in checked:
                continue
            elif s.count(c) == t.count(c):
                checked.append(c)
            else:
                return False
        return True

In this code, it goes into the for loop by each character as c. Each character is counted in s and t, then compare the count. If counts are equal, put this c into checked so the same character will not be considered for checking the count anymore.

This code reduced the runtime by 13ms(20ms → 7ms).

Reflection

Well, I normally do not have to consider the time complexity in easy problems like this, but I would not feel any comfortable with two sorting code lines though. It is better to practice the runtime efficiency for harder problems, I think.

[LeetCode] Top Interview 150 Problems Solving # 205 Isomorphic Strings

Ramhee Yeon — Mon, 24 Mar 2025 14:21:11 GMT

Understanding the Problem

There are strings as s and t. The length of s and length of t are equal as len(s) == len(t). This problem requires deciding whether s and t are isomorphic, which means they have the same structure, or pattern.

As an example, when there are strings s = “add” and t = “egg”, ”a” could be substituted by ”e”, and ”d” could be substituted by ”g”. If the pattern ”a” → ”e” is not met properly, it is not isomorphic anymore, thus return False.

# example 1
Input: s = "add", t = "egg"
Output: True

# example 2
Input: s = "add", t = "ege"
Output: False

Approach

This problem was somewhat similar to one of those problems that I solved already. First I would declare d dictionary and one by one add up to the dictionary if the key and value do not exist in it, otherwise return False.

Solution

I would go with 1 for loop and another for loop for value checking in the dictionary. It is to use vals in if conditions.

1st if checks whether key and value are already positioned in d. If they all do not exist in d, simply make a key-value in d.
2nd if checks if key does not exist in d, but value exists in it.
3rd also checks if key exists in d but if the value is not equal to what is already in d.

class Solution:
    def isIsomorphic(self, s: str, t: str) -> bool:
        d = {}

        for i, j in zip(s, t):
            vals = [v for v in d.values()]

            if i not in d and j not in vals:
                d[i] = j
            elif i not in d and j in vals:
                return False
            elif i in d and d[i] != j:
                return False

        return True

The solution above seemed pretty okay but using two for loops was a little disturbing. It took 15ms to run the entire test case. So I have removed vals as for loop, instead I used append() to vals so it will not iterate for every loop.

class Solution:
    def isIsomorphic(self, s: str, t: str) -> bool:
        # saved runtime by 11ms!
        d = {}
        vals = []

        for i, j in zip(s, t):
            if i not in d and j not in vals:
                d[i] = j
                vals.append(j)
            elif i not in d and j in vals:
                return False
            elif i in d and d[i] != j:
                return False

        return True

This solution passed with 4ms, which is faster by 11ms than the previous one.

Reflection

Hashmap, or dictionary, problem is about checking keys, values, insert and removing and others, I think. But considering the time complex is also important. I would better practice reducing time complex to avoid unnecessary memory usage.

[LeetCode] Top Interview 150 Problems Solving # 35 Search Insert Position

Ramhee Yeon — Sat, 22 Mar 2025 07:59:13 GMT

Understanding the Problem

This should be a typical binary search algorithm problem. This problem is meticulous about the time efficiency as this has to be solved with the time complex of O(log n).

There is a number target as an integer. It is to find the right spot for target to be in the list nums, where numbers are sorted low to high like [1, 3, 6, 7]. If target = 2, the returned value should be 1, as target should be in between 1 and 3, which has the index 1 as the position.

Approach

Binary search has not been considered by me. I have always dealt with sorting problems with sorted() function and search problems with index() or find() methods.

My idea was to solve this problem with finding the middle as half and slicing the list down as the loop goes on and on. This was completely a wrong approach, because I would not be able to find the index with it.

The final approach was to update left and right values with index. I had a little help by a LLM though.

Solution

The solution with left and right as index approach mitigated the complexity of the code.

First, left and right was set with index values, the beginning and the end of the list.
In while loop, it will be stopped if left meets right so it will not be processed any further even when left is bigger than right
Take half to get the value in the middle. if target is equal to the value nums[half], return the index value half.
If target is greater than nums[half], left is updated by adding half + 1. It is to jump the index to the half so the binary approach is applied. + 1 is added though, because of index difference
- For instance, imagine the list is nums = [1, 3, 5, 7] and target is 6. In the first loop, half will be (0 + 3) // 2, which will be 1.
- target is greater than nums[half], which is 3, so update left with left = 1 + 1, to make it to 2 so it will start from index 2.
- Then this time only nums = [5, 7] will be considered.
In other cases, update right to stick with the left half.

class Solution:
    def searchInsert(self, nums: List[int], target: int) -> int:
        left = 0
        right = len(nums) - 1

        while left <= right:
            half = (left + right) // 2
            if target == nums[half]:
                return half
            elif target > nums[half]:
                left = half + 1
            else:
                right = half -1
        return left

Reflection

From my perspective, algorithm problems often require searching methods with good time complexity. Binary search with time complexity of O(log n) is very typical and I should be familiar with it. The key is to use index values other than dealing with the actual list and slicing.

[LeetCode] Top Interview 150 Problems Solving # 14 Longest Common Prefix

Ramhee Yeon — Thu, 20 Mar 2025 08:36:24 GMT

Understanding the Problem

It is simply to find the common prefix in between the strings in the list strs. strs contains strings like strs = ["flower","flow","flight"]. In this case the common prefix will be ”fl” which is included in every string in common. When there is no common prefix, an empty string will be returned.

Approach

The only idea to solve the problem was to go with two for loops with O(n²). The base characters to compare other characters in each string is the first string’s characters.

Solution

There were cases when there is a string in strs but the string is empty like ””, so I returned this when the string length is 1 and it’s empty.

Then try to pile characters in s = ““ if a character in each string is common. Afterwards, it takes steps as follows.

The loop occurs in strs[0]. and another loop goes in each string as word.
Only if the index i is less than the length of the word, check if the character is common. If not common, I returned s to finish the code.
If i is equal or greater than len(word), return s as it is no use running the code any further.
Other than that, add the character to s.

class Solution:
    def longestCommonPrefix(self, strs: List[str]) -> str:
        if len(strs) == 1:
            return strs[0]

        s = ""

        for i in range(len(strs[0])):
            for word in strs:
                if i < len(word): # only when index is within range
                    if word[i] != strs[0][i]:
                        return s
                elif i >= len(word):
                    return s
            s += strs[0][i]
        return s

Reflection

I have looked up other solutions on LeetCode but it was plausible that everyone chose to solve the problem with O(n²) just like I did it. I believe there should be a better solution though.

[LeetCode] Top Interview 150 Problems Solving # 70 Climbing Stairs

Ramhee Yeon — Tue, 18 Mar 2025 11:21:08 GMT

Understanding the Problem

A number is given as n. Imagine that only 1 or 2 steps could be taken at once to reach the number n. For instance, if the number is 4, there are 5 ways to reach number 4.

n = 4

# ways to reach n
1, 1, 1, 1 # 1
2, 1, 1 # 2
1, 2, 1 # 3
1, 1, 2 # 4
2, 2 # 5

So the return number should be 5.

Approach

I was attempting to solve the problem with mathematical approach, but I could not find a pattern by dividing by 2 or other methods.

After some trials, I was manually enumerating the ways to reach the number and found out that there was a pattern, which each number was a output of the sum of the previous 2 numbers. With this hint, I used a loop to complete the task.

Solution

The first 3 numbers were okay to return itself. So I returned n if the number is less or equal to 3. And I took the steps as follow.

The number starts at least from 4, so I declared nums = [1, 2, 3] first, and count = 3
In the while loop, it appends to nums the sum of the last two numbers by accessing with nums[-1] and nums[-2]
Eventurally return nums[-1]

class Solution:
    def climbStairs(self, n: int) -> int:
        if n <= 3:
            return n

        nums = [1, 2, 3]
        count = 3

        while count < n:
            nums.append(nums[-1] + nums[-2])
            count += 1

        return nums[-1]

Reflection

It took some time to solve the problem until I figured out the pattern of the steps. This time I learned also that sometimes manually enumerating to find some patterns in problems like this will help finding the solution, yet it is not always the best practice though.

[LeetCode] Top Interview 150 Problems Solving # 290 Word Pattern

Ramhee Yeon — Sat, 15 Mar 2025 09:03:40 GMT

Understanding the Problem

There are two strings given. One is pattern, and another one is a string with spaces. If the pattern is ”abba” and the string is ”dog cat cat dog”, the first index ”a” takes ”dog”, the second index ”b” takes ”cat”, the third index ”b” takes ”cat” again, the fourth index ”a” takes ”dog”. The ”a” all matches with ”dog” and all ”b” matches with ”cat”, so it returns True.

See the example below.

# example 1
Input: pattern = "abba", s = "dog cat cat dog"
Output: True

# example 2
Input: pattern = "abba", s = "dog cat dog dog"
Output: False

The second example returns False because ”b” patterns have two different values ”cat” and ”dog”, so the patterns do not have relative values.

Approach

It is a hashmap problem. I would use a hashmap and check if the keys values of each pattern match what is in the dictionary.

Use dictionary
loop the pattern and check the dictionary

Solution

I had to declare the dictionary first and fill it up one by one to check the next key and value of the patterns. For the exception, if the length of the pattern and s are not equal, I returned False.

The key to solving the problem is like this.

for looping with zip(pattern, s.split()) to check key and value in pair
Get all values and keys of the dictionary in vals and keys
Check with conditions

if the pattern is not in dictionary keys, and the value also is not in the values of the dictionary, simply add the key and value
if the pattern is not in the keys of the dictionary, but the value is in the dictionary, the pattern does not match, so return False. For example, the dictionary is with key value as d = {"a": "dog"}, and the current pattern is ”b” with value ”dog”, the pattern does not match the current dictionary
if pattern is in the keys of the dictionary and the value is not equal to its value, it returns False. For example, the dictionary has key value as d = {“a”": “dog”}, and the current pattern is “a” with value ”cat”, the pattern does not match

class Solution:
    def wordPattern(self, pattern: str, s: str) -> bool:
        if len(pattern) != len(s.split()):
            return False

        d = {}

        for p, w in zip(pattern, s.split()):
            vals = d.values()
            keys = d.keys()

            if p not in keys and w not in vals:
                d[p] = w
            elif p not in keys and w in vals:
                return False
            elif p in keys and d[p] != w:
                return False
        return True

Reflection

It had O(n) complexity and the solution is pretty much okay I guess. It is just a bit making me feel uncomfortable with many elif conditions but other people also chose this way to solve the problem.

[LeetCode] Top Interview 150 Problems Solving # 28 Find the Index of the First Occurrence in a String

Ramhee Yeon — Fri, 14 Mar 2025 15:50:09 GMT

Understanding the Problem

There are two strings given as haystack and needle. needle is the string that could be included in haystack or not included. If it is included, return the first index that needle occurs. If it is not included in haystack, return -1.

For example, if haystack = "sadbutsad" and needle = "sad", the string ”sad” occurs at the index 0 and 6, so return 0.

Approach

My approach is to check whether needle is in haystack, then find the first index.

Solution

Check if needle is in haystack by if needle in haystack clause
If it exists in haystack, then use find() method to get the first index of the needle and return it
If it does not exist in haystack, return -1

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        return haystack.find(needle) if needle in haystack else -1

Reflection

I was wondering if I should use two pointers to solve this problem, but no need for that, as there are internal functions and methods to use.

[LeetCode] Top Interview 150 Problems Solving # 58 Length of Last Word

Ramhee Yeon — Thu, 13 Mar 2025 05:07:18 GMT

Understanding the Problem

A string is given. It has spaces in between. The task is to return the last string’s length. If the string is s = “Hello world”, the last string will be ”world” and its length will be 5, so return 5.

Approach

My approach will be to split(), get the last index and return length of it.

Solution

It was a simple problem. It would take a few lines without any help of the internal functions but it took one line with it.

First, split the string by space, then get the last index with [-1], then get the length of the last index with len().

class Solution:
    def lengthOfLastWord(self, s: str) -> int:
        return len(s.split()[-1])

Reflection

Nothing much to say about this. I was happy to see that this kind of a problem could show in interviews.

[LeetCode] Top Interview 150 Problems Solving # 121 Best Time to Buy and Sell Stock

Ramhee Yeon — Wed, 12 Mar 2025 14:38:48 GMT

Understanding the Problem

It is to find the best time to sell the stock with the max profit. A given list indicates the stock prices. Each element is a stock price in time series. For instance, if prices = [7,1,5,3,6,4], the best time to buy the stock is when the price is 1 and the best time to sell it is when the price is 6, so the profit will be 5, which is a returned number.

Approach

My idea was to loop it twice with for to find the best profit. But I faced the time limit error and could not pass with O(n²). I had to make it to O(n).

Solution

Well, my goal was to find the min number and the max difference to find the best profit. So first initialize min_num with float(“inf”). The code will start the loop one by one of the prices.

If the element p is lower than min_num, set p to min_num for subtraction comparison. if p - min_num is greater than max_diff, which means the profit is greater, set max_diff with p - min_num.

class Solution:
    def maxProfit(self, prices: List[int]) -> int:
        min_num = float("inf")
        max_diff = 0

        for p in prices:
            if p < min_num:
                min_num = p
            elif p - min_num > max_diff:
                max_diff = p - min_num
        return max_diff

Reflection

The code was simple but it took some time for me to solve the problem. I was busy visiting my parents and I was looking for other jobs to move to, which means I could not focus on the problem solving algorithm. Anyways, time efficiency this time again.

[LeetCode] Top Interview 150 Problems Solving # 169 Majority Element

Ramhee Yeon — Fri, 07 Mar 2025 08:33:10 GMT

Understanding the Problem

It is a simple task to return the number which appears the most in the list nums. If the input is nums = [2,2,1,1,1,2,2], the number that appears the most will be 2, thus return 2.

Approach

I would first acquire the unique numbers from nums and count how many times they appear each in the list nums. Then return the number that appears in the list the most.

Solution

class Solution:
    def majorityElement(self, nums: List[int]) -> int:
        unique = list(set(nums))
        max_num = [nums.count(i) for i in unique]
        idx = max_num.index(max(max_num))

        value = unique[idx]
        return value

As simple as that, I would get the unique number with set() and change it to list() in case the index can be positioned randomly. Then count how many times the unique numbers appear. In the end get the majority number’s count by getting the index, then get the number from the unique with the idx, which was the result of max_num.index(max(max_num)).

Reflection

I thought there would be a better solution to this but other people also had it solved pretty much the same way. This did not take too long, as it was considered easy by LeetCode.

[LeetCode] Top Interview 150 Problems Solving # 20 Valid Parentheses

Ramhee Yeon — Thu, 06 Mar 2025 08:17:50 GMT

Understanding the Problem

A given string with brackets have three types of brackets; brackets (), braces {} and squared brackets []. It is to return True if the brackets are paired to open and close in a correct way, like (){}[] or {[()]}. Other strings like [(]) or (() will be considered as not paired so False will be returned.

Approach

I considered using a dictionary with opening of the brackets as keys and closing of the brackets as values, so I would know the opening and closing will pair. Then use a stack to compare the values.

Solution

It was not that easy until I figured out it was all about stack problem. I attempted to solve it with slicing the list from the opening of the bracket to the closing of the bracket then check whether the values at the certain indexes will match, but this one would not solve the test cases completely as it will disregard the brackets in the middle of the list, of the sliced list.

But using stack was the idea, then I wrote code in these steps.

If the string has got the odd length or the closing brackets come first, return False
Go through for loop, and stack the openings of brackets
When the character s[i] is not a opening of a bracket, check it with the last stack if the key and value match one another. If not, return False
If the closing bracket s[i] match the opening value from the stack, pop it from the stack

class Solution:
    def isValid(self, s: str) -> bool:
        if len(s) % 2 != 0:
            return False

        d = {
            "(": ")",
            "{": "}",
            "[": "]"
        }

        stack = []
        for i in range(len(s)):
            if len(stack) == 0 and s[i] not in d.keys():
                return False

            if s[i] in d.keys():
                stack.append(s[i])
            elif s[i] != d[stack[-1]]:
                return False
            else:
                stack.pop()

        if len(stack) != 0:
            return False
        return True

Reflection

It is always a matter of finding the right algorithm to solve a problem. I have got the tendency to spend some time by just coding without thinking much of the intension of the problem, but it is not always working good. I learned again that setting a right algorithm first is necessary though.

Ramieeee's IT blog

[Paper Review] Development of a machine learning approach for prediction of red blood cell transfusion in patients undergoing Cesarean section at a single institution

Introduction

Methods

Results

Discussion

Reflection

Reference

[Paper Review] MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

1. Introduction

2. MEM1

3. Experiment & Results

Reflection

Reference

[Paper Review] Revolutionizing Speaker Recognition and Diarization: A Novel Methodology in Speech Analysis

1. Abstract

2. Introduction

Models and Packages

in summary

3. Related work

Embeddings

Proposal

k-means

4. Methodology

Whisper

Other Speech-to-Text models

Suggested algorithm

Encoder, decoder

Process in short,

Dataset

Reflection

Reference

[Paper Review] Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

1. Introduction

Datasets

Trade-off

2. Suggested RLHF

1) AB test

2) Preference Model

3) RLHF

3. Evaluation

4. Reflection

Evaluation

Datasets

5. Reference

[Paper Review] Object Recognition and Positioning with Neural Networks: Single Ultrasonic Sensor Scanning Approach

1. Introduction

Why Ultrasonic sensor?

Aim

2. Proposed methods

Data Collection

Dataset

Algorithm

Preprocessing

Normalization

3. Key Point

4. Reflection

Improvements

5. Reference

[Paper Review] DeepSeek-R1 Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

1. Abstract

2. Introduction

3. Approach

Rewards

How Test Was Done

4. Key Point

DeepSeek-R1?

Process in Summary

How Test Was Done

5. Reflection

6. Reference

[LeetCode] Top Interview 150 Problems Solving # 80 Remove Duplicates from Sorted Array II

Understanding the Problem

Approach

Solution

Reflection

[LeetCode] Top Interview 150 Problems Solving # 151 Reverse Words in a String

Understanding the Problem

Approach

Solution

[Paper Review] Training a Helpful and Harmless Assistant withReinforcement Learning from Human Feedback