<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Ramieeee's IT blog]]></title><description><![CDATA[Algorithms, IT news, my thoughts note]]></description><link>https://ramieeee.me</link><generator>RSS for Node</generator><lastBuildDate>Sun, 10 May 2026 15:55:33 GMT</lastBuildDate><atom:link href="https://ramieeee.me/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[[Paper Review] Development of a machine learning
approach for prediction of red
blood cell transfusion in patients
undergoing Cesarean section
at a single institution]]></title><description><![CDATA[The current interest of mine is leaned towards the AI research applied to biomedical or healthcare domains. While scanning through papers, I spotted this paper and I decided to have a look at it. Inde]]></description><link>https://ramieeee.me/paper-review-development-of-a-machine-learning-approach-for-prediction-of-red-blood-cell-transfusion-in-patients-undergoing-cesarean-section-at-a-single-institution</link><guid isPermaLink="true">https://ramieeee.me/paper-review-development-of-a-machine-learning-approach-for-prediction-of-red-blood-cell-transfusion-in-patients-undergoing-cesarean-section-at-a-single-institution</guid><category><![CDATA[ML]]></category><category><![CDATA[Paper Review]]></category><category><![CDATA[blood transfusion]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Sat, 04 Apr 2026 14:56:08 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/67aa063749e384ed1ac1bb6e/b97723c3-9f04-4ec4-a0ba-1fa99f009534.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The current interest of mine is leaned towards the AI research applied to biomedical or healthcare domains. While scanning through papers, I spotted this paper and I decided to have a look at it. Indeed this was intriguing, and worth sparing time to read.</p>
<p>It briefly talks about the ML models trained to predict whether the parturients will require blood transfusion while undergoing cesarean section, published in 2024 on Nature.</p>
<h1>Introduction</h1>
<p>The amount of the blood loss during cesarian section and surgery is large and the transfusion requirement is not occasional, but a mendatory.</p>
<p>The good amount of blood preparation could save time cost intraoperatively. Blood is a scarce medical resource and it has to be significantly prepared with right amount so not to waste before surgery, thus accurately predicting the amount of blood is always demanded.</p>
<p>This paper articulates the data preparation and model comparison metrics and gives insight on which data to leverage for training ML models.</p>
<p>This paper mostly focuses on red blood cell excluding other blood products. The primary aim by the paper is to select the right ML model with the best performance in predicting the need for an intraoperative Red Blood Cell(RBC) transfusion during a CS.</p>
<h1>Methods</h1>
<p>The data could be divided into two parts: demographic data of the patient and perioperative data.</p>
<ul>
<li><p>Data</p>
<ul>
<li><p>Data size</p>
<ul>
<li><p>total: 16,137</p>
</li>
<li><p>used: 14,254 after excluding non-complete data</p>
</li>
<li><p>RBC transfusion during surgery: 1,020 patients (7.16% of the total)</p>
</li>
<li><p>data split: 6:2:2 for training, validation and test</p>
</li>
</ul>
</li>
<li><p>the most recent data values within two days prior to surgery</p>
<ul>
<li><p>demographic data</p>
<ul>
<li><p>age, weight, height etc.</p>
</li>
<li><p>placenta previa totalis/partialis/marginalis</p>
</li>
</ul>
</li>
<li><p>perioperative data</p>
<ul>
<li>anesthesia, midazolam use, RBC transfusion</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>Data ratio for classification was particularly interesting to see. It is uncommon to apply imbalanced ratio to the number of ground truths. This paper compared by applying 1:1, 1:2..., 1:4 even.</p>
<ul>
<li><p>ML models</p>
<ul>
<li>XGBoost, KNN, DT, SVM, MLP, LR, RF, DNN</li>
</ul>
</li>
<li><p>Model assessment</p>
<ul>
<li><p>AUROC</p>
</li>
<li><p>AUPRC</p>
</li>
<li><p>metrics</p>
<ul>
<li>accuracy, recall, precision, F1</li>
</ul>
</li>
</ul>
</li>
</ul>
<h1>Results</h1>
<img src="https://cdn.hashnode.com/uploads/covers/67aa063749e384ed1ac1bb6e/7c2569ee-9c11-4c18-a38c-457af4507437.png" alt="" style="display:block;margin:0 auto" />

<p>In accord with the paper, XGBoost most excel in the prediction for the blood transfusion with the score of AUROC 0.82 and Accuracy 0.94.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67aa063749e384ed1ac1bb6e/062acfe7-6dd5-491c-b836-36ac19b61006.png" alt="" style="display:block;margin:0 auto" />

<p>This figure compares ROC and PRC between each model.</p>
<p>But when I look at PRC curve, the recall values are too low that it the graph forms weird curves. Also the table above indicates that the F1 score is not greater than 0.5.</p>
<p>The figure below shows all ROC and PRC curves for different models with different dataset ratio.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67aa063749e384ed1ac1bb6e/278cdcc1-7870-4651-ae89-9b7fb907e837.png" alt="" style="display:block;margin:0 auto" />

<h1>Discussion</h1>
<ul>
<li><p>1:1 ratio dataset did not improve the performance of the model.</p>
<ul>
<li>imbalanced dataset was better performed</li>
</ul>
</li>
<li><p>Traditional modeling can lead to degradation of performance where as ML aims for broader generalization and is adequate in this case</p>
</li>
<li><p>Limits</p>
<ul>
<li><p>single-center study</p>
<ul>
<li>heterogeneous dataset could help generalizing the blood transfusion metrics which could enhance the model performance in general</li>
</ul>
</li>
<li><p>needs more data balancing techniques for model training</p>
<ul>
<li>by only selecting 1:1, 1:2..., 1:4 dataset ratio has a certain limit. This could confuse the model when the dataset is well-refined and well-polished.</li>
</ul>
</li>
</ul>
</li>
</ul>
<h1>Reflection</h1>
<p>The research gave me a great range of insight when understanding medical data and how to apply models to it. However there were several moments when I felt the paper could be better in quality and have the models that could be utilized in the actual CR.</p>
<p>I would like to list a few things that I think will improve the performance of the models and settle down some limitations the paper mentioned.</p>
<ul>
<li><p>Recall score is extremely low, only precision is high</p>
<ul>
<li><p>Well, I think this come from the data balance. The balance could be adjusted or the data itself could be normalized or preprocessed in advance. Raw data has a high risk as some features will carry a greater parameter for the model and this will affect the performance. The range generalization or trimming unnecessary data could help too.</p>
</li>
<li><p>The result indicates that the performance of XGBoost model excels but the recall score tells that the model is no use, even though the precision and accuracy metrics are high. This indicates that the model is highly biased towards predicting True Negative class as the best model trained with data ratio of True and False was not 1:1. This could imply that True Positive rate is extremely low, meaning when the parturients actually need blood transfusion, the model could say they will not need transfusion, which will increase the clinical risk.</p>
</li>
</ul>
</li>
<li><p>data split could be 8:1:1 to focus more on the training</p>
<ul>
<li>The data lacked in the size. Only a 1-2 thousands of data rows could be considered to be used with 50:50 ratio for model training. To generalize the prediction performance with the scarce data could cause a highly biased result. If the dataset size is small, the training data rows should be more than 60%.</li>
</ul>
</li>
</ul>
<h1>Reference</h1>
<p>Lee, S. W., Park, B., Seo, J., Lee, S., &amp; Sim, J. H. (2024). Development of a machine learning approach for prediction of red blood cell transfusion in patients undergoing Cesarean section at a single institution. <em>Scientific Reports</em>, <em>14</em>, 16628. <a href="https://doi.org/10.1038/s41598-024-67784-2">https://doi.org/10.1038/s41598-024-67784-2</a></p>
]]></content:encoded></item><item><title><![CDATA[[Paper Review] MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents]]></title><description><![CDATA[I have been looking for a method that will fulfill tasks for extracting data from a long-sequenced and unstructured text using LLM.
If given a pdf file of a research paper, my first approach was to iterate each pages and feed the text data from a pag...]]></description><link>https://ramieeee.me/paper-review-mem1-learning-to-synergize-memory-and-reasoning-for-efficient-long-horizon-agents</link><guid isPermaLink="true">https://ramieeee.me/paper-review-mem1-learning-to-synergize-memory-and-reasoning-for-efficient-long-horizon-agents</guid><category><![CDATA[llm]]></category><category><![CDATA[Reasoning AI Models]]></category><category><![CDATA[agents]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Sun, 08 Feb 2026 14:04:05 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1770559617462/bb0e1ce0-7c1b-47fc-a980-8ff7b6ac7503.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I have been looking for a method that will fulfill tasks for extracting data from a long-sequenced and unstructured text using LLM.</p>
<p>If given a pdf file of a research paper, my first approach was to iterate each pages and feed the text data from a page to an agent. But an agent is stateless, meaning it does not have any information of the previous page and will cause data loss as some information is divided.</p>
<p>I, then, came up with an idea of a shared-memory as a state to be utilized by each agent in every step.</p>
<p>This paper’s goal is to enable the agents to perform a better reasoning and inference, also to reduce the inference time with less memory utilization.</p>
<h1 id="heading-1-introduction">1. Introduction</h1>
<ul>
<li><p>Problem arisen from a traditional long context data processing with LLM</p>
<ul>
<li><p>full-context prompting, appending all past turns regardless of their relevance</p>
</li>
<li><p>Growing inference cost and memory usage</p>
</li>
<li><p>Generalization limits beyond the training horizon</p>
</li>
<li><p>Overloaded and inefficient context</p>
</li>
</ul>
</li>
<li><p>Solution</p>
<ul>
<li><p>a model to learn to consolidate its memory as part of its reasoning process</p>
</li>
<li><p>memory to be shared by agents</p>
</li>
</ul>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770538101911/15fef603-358d-4da1-8b3a-740415caa043.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-2-mem1">2. MEM1</h1>
<ul>
<li><p>Annotate each component using XML-style tags</p>
<ul>
<li><p>&lt;IS&gt; for internal state (reasoning)</p>
<ul>
<li><p>summarizes past information</p>
</li>
<li><p>reasons about subsequent actions</p>
</li>
</ul>
</li>
<li><p>&lt;query&gt; for environment queries</p>
</li>
<li><p>&lt;answer&gt; for the agent’s responses</p>
</li>
<li><p>&lt;info&gt; for external observations or tool outputs</p>
</li>
</ul>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770541971425/33033527-d84b-4271-be13-d4c38649a2b5.png" alt class="image--center mx-auto" /></p>
<p>The process indicates that the $IS_{t-1}$, $Query_{t-1}$, $Info_{t-1}$ are processed to be given to $IS_{t}$. In every step this process happens to get rid of the unnecessary data, which may affect the inference performance and data quality.</p>
<h1 id="heading-3-experiment-amp-results">3. Experiment &amp; Results</h1>
<p>Interestingly, MEM1 approach showed the two key results</p>
<ul>
<li><p>better at inference time (less than others)</p>
</li>
<li><p>better in match count</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770558287942/e654e196-705e-4722-b2a0-33d9d4f221aa.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-reflection">Reflection</h1>
<p>In our research the entire MEM1 process is an excess approach, and also is a slightly different topic. However, it is notable that the paper represents the shared-memory technique to safely toss the data to the next agent and cast aside the incorrect data.</p>
<p>I will adapt the shared-memory in our pipeline, and the draft of the pipeline will look something like this.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770559239501/35159817-30e2-4413-aa18-2b497fad4a0b.png" alt class="image--center mx-auto" /></p>
<p>It may not state everything in a accurate way but seems at least feasible to be applied to our research.</p>
<p>Also, when we consider training a model in our specific domain (Cognitive Reserve), I asked to myself, “Can we train in such a way?”, and came to a conclusion that I cannot as CR needs a specific dataset and cannot be trained and generalised.</p>
<h1 id="heading-reference">Reference</h1>
<p>[1] Zhou, Z., Qu, A., Wu, Z., Kim, S., Prakash, A., Rus, D., ... &amp; Liang, P. P. (2025). MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents. <em>arXiv preprint arXiv:2506.15841</em>.</p>
]]></content:encoded></item><item><title><![CDATA[[Paper Review] Revolutionizing Speaker Recognition and Diarization: A Novel
Methodology in Speech Analysis]]></title><description><![CDATA[It is my first paper review about Speech-to-Text methodology. After reading this paper I tried to dig into the sound and wave world to learn how the digitalized sound data is transformed into the form that humans can understand.
My company is now wor...]]></description><link>https://ramieeee.me/paper-review-revolutionizing-speaker-recognition-and-diarization-a-novel-methodology-in-speech-analysis</link><guid isPermaLink="true">https://ramieeee.me/paper-review-revolutionizing-speaker-recognition-and-diarization-a-novel-methodology-in-speech-analysis</guid><category><![CDATA[STT]]></category><category><![CDATA[Diarization]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Sat, 05 Jul 2025 08:52:12 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1751705472244/ee752ae8-56cf-45ca-bcda-95483559518f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It is my first paper review about Speech-to-Text methodology. After reading this paper I tried to dig into the sound and wave world to learn how the digitalized sound data is transformed into the form that humans can understand.</p>
<p>My company is now working on the Speech-to-Text skills and trying to catch up the state-of-the-art techniques to provide with the better quality services. Yet, there has been a lack of research and the stout background of STT knowledge, thus I picked several research papers including this paper. We need diarization as well to be able to specify the speaker as well as transcription. This paper helped in such way to understand how the architecture and algorithms should look like prior to building such service.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751702157529/02ec8b7e-fb0a-4342-b69d-56b9469ab27b.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-1-abstract">1. Abstract</h1>
<p>The paper aims to fulfill Speech-to-Text tasks with speaker recognition, so called diarization leveraging Whisper model for transcription, ECAPA-TDNN for speaker embeddings, Agglomerative Hierarchical Clustering for speaker clustering to identify who spoke when.</p>
<h1 id="heading-2-introduction">2. Introduction</h1>
<ul>
<li>why the paper do this research?</li>
</ul>
<p>in modern society meeting transcription and audio processing are important. Yet the accuracy in audio processing needs improvement to be further developed to reach the service level.</p>
<ul>
<li><p>objects</p>
<ul>
<li><p>identification and segmentation of speakers</p>
</li>
<li><p>content comprehension</p>
</li>
</ul>
</li>
<li><p>with what?</p>
<ul>
<li><p>speaker embeddings</p>
<ul>
<li><p>various acoustic features within speech</p>
</li>
<li><p>discern between speakers</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<h3 id="heading-models-and-packages">Models and Packages</h3>
<p>In this paper the research was conducted using Whisper model for transcription, Pyannote algorithm for speaker embeddings and Agglomerative Hierarchical Clustering for grouping similar embeddings. The key features in short are listed as below.</p>
<ul>
<li><strong>Whisper</strong></li>
</ul>
<p>Whisper has been trained with multilingual supervised dataset to be able to differentiate languages, linguistic nuance and accents. It supports with many languages. Details to be mentioned later below.</p>
<ul>
<li><strong>Pyannote</strong></li>
</ul>
<p>It is another model to extract and manipulate speaker embeddings. It encapsulates the unique sound characteristics of speakers. It is the key feature to diarize.</p>
<ul>
<li><strong>Agglomerative Hierarchical Clustering</strong></li>
</ul>
<p>The embedded data will now then grouped into clusters using this algorithm, then it could be labeled.</p>
<h3 id="heading-in-summary">in summary</h3>
<ul>
<li><p>whisper: transcription of audio data</p>
</li>
<li><p>pyannote: extract embeddings from acoustic features</p>
</li>
<li><p>Agglomerative hierarchical clustering: unveil relationship between speakers</p>
</li>
</ul>
<h1 id="heading-3-related-work">3. Related work</h1>
<p>RNN and LSTM algorithm were utilized for Diarization research to understand the sequential features from the audio. However, there was a limitation in long sequential data. Also, CNN was used in such matter. It excelled in hierarchical spatial features and pattern extraction from spectrogram but had also a limit for variable data length (input size). After sometime when Transformers was released to the world, it was adapted for diarzation researches and applications as it performed well in many tasks.</p>
<h3 id="heading-embeddings">Embeddings</h3>
<p>To read the features for diarization, many researches were conducted. X-vectors is one of them and is introduced as the state-of-the-art algorithm (time delay neural network, so called TDNN) for speaker verification</p>
<h3 id="heading-proposal">Proposal</h3>
<ul>
<li><p>to overcome the challenges and limitation stated above, this paper proposes the methodology of <code>Emphasized Channel Attention, Propagation, and Aggregation (ECAPA-TDNN)</code></p>
<ul>
<li><p>ECAPA-TDNN</p>
<ul>
<li><p>an advanced iteration of TDNN</p>
</li>
<li><p>it uses</p>
<ul>
<li><p>attention mechanism</p>
</li>
<li><p>multilayer feature aggregation (MFA)</p>
</li>
<li><p>squeeze excitation modules</p>
</li>
<li><p>residual blocks</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751704624229/cb253546-62e6-48ee-bcb4-49e4f902c52d.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-k-means">k-means</h3>
<p>efficient on large dataset, but has limitation when a speaker is dominant, but agglomerative hierarchical clustering can control such imbalances</p>
<h1 id="heading-4-methodology">4. Methodology</h1>
<h3 id="heading-whisper">Whisper</h3>
<p>The notable information about Whisper model is that, it is built with the transformer architecture with an encoder a decoder. It supports 99 languages with the word error rate (WER) of 4.2%. Korean has a lot more of a complexity in measuring the error rate and it cannot be measured by the WER as the spacing and letter combination system differ. Instead, character error rate is adopted to check the performance.</p>
<ul>
<li><p>4.2% WER</p>
</li>
<li><p>99 languages</p>
</li>
<li><p>680,000h audio (online platform)</p>
<ul>
<li><p>563,000h english</p>
</li>
<li><p>117,000h other languages</p>
</li>
</ul>
</li>
<li><p>robustness against accents, ambient disturbances</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749974516385/53e6040d-69a8-41bc-9e7b-e8005f44aa41.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p>currently has large-v3 and turbo model</p>
</li>
<li><p>utilizes encoder-decoder transformer</p>
<ul>
<li><p>encoder: derives a latent representation from speech</p>
</li>
<li><p>decoder: generates text, based on the latent representation</p>
</li>
</ul>
</li>
</ul>
<h3 id="heading-other-speech-to-text-models">Other Speech-to-Text models</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749883556181/908164d3-4036-4d08-9986-90c7ea6d677e.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-suggested-algorithm">Suggested algorithm</h3>
<p>Audio file is processed and handled in 16kHz PCM format scaled in a range of -1 to 1 normalized. Frequency is then converted using 80 channel Mel Spectrogram as 80 channel is most common and accepted by experience.</p>
<h3 id="heading-encoder-decoder">Encoder, decoder</h3>
<ul>
<li><p>conversion involves</p>
<ul>
<li><p>window size: 25ms</p>
</li>
<li><p>stride: 10ms</p>
</li>
<li><p>segments: 30s</p>
</li>
</ul>
</li>
<li><p>encoder operates per 30-second segment, to extract features</p>
<ul>
<li><p>it involves two GELU activated convolutions</p>
<ul>
<li>filter size of 3 for input embeddings</li>
</ul>
</li>
<li><p>position embedding uses Sine function</p>
<ul>
<li>performed by transformer</li>
</ul>
</li>
</ul>
</li>
<li><p>decoder</p>
<ul>
<li><p>calculates probability based on the latent representation</p>
</li>
<li><p>token determination via Greedy Search or Beak Search</p>
</li>
<li><p>output: maximum 224 tokens per 30-second segment</p>
</li>
</ul>
</li>
</ul>
<h3 id="heading-process-in-short">Process in short,</h3>
<ol>
<li><p>transcription</p>
<ul>
<li>spoken content (Whisper)</li>
</ul>
</li>
<li><p>speaker embeddings</p>
<ul>
<li><p>speaker embeddings from audio (unique features of individuals)</p>
<ul>
<li>bases for analysis</li>
</ul>
</li>
</ul>
</li>
<li><p>clustering</p>
<ul>
<li>clustering using Agglomerative hierarchical clustering method based on similarity</li>
</ul>
</li>
<li><p>output</p>
<ul>
<li><p>will be able to see who spoke when</p>
<ul>
<li><p>Whisper model’s output has time information with transcription</p>
</li>
<li><p>audio will be cut according to the time information and determines the speaker</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<h3 id="heading-dataset">Dataset</h3>
<p>Paper used VoxCeleb1 and 2 dataset. This dataset was collected from Youtube of the celebrities. VoxCeleb dataset is the best usecase for many researches as it contains different voices from clear voices to voices with noise in the background.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751704693562/d7bc77d9-fbbf-4256-aaa5-509c7487969e.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-reflection">Reflection</h1>
<p>It was indeed intriguing and was informative for me to kick off in the STT field. However, what was lacking was that it gave less information about how the digitalized sound information was transformed into Mel-Spectrogram and data process. Overall the explanation about the process was a bit of a let-down.</p>
<p>Secondly, the paper shows the result data with two model comparison, yet it did not give the comparison data of train and test data error rate so to tell if the model was overfitted.</p>
<p>The algorithm of ECAPA-TDNN will be a good choice for diarization if the number of speaker is fixed in every situation. But in real life there is always a exception where in the meeting, another member comes in and the existing member could be out from the meeting. I personally think that the embeddings and clustering should consider such situation that the embedding information could change in any situation.</p>
<p>With such realization, next paper to read will be the very standard paper written by Oxford researchers who established VoxCeleb dataset. Or something else if there wil be any interesting paper</p>
<h1 id="heading-reference">Reference</h1>
<p>[1] R. D. Shankar, R. B. Manjula, and R. C. Biradar, "Revolutionizing Speaker Recognition and Diarization: A Novel Methodology in Speech Analysis," <em>SN Computer Science</em>, vol. 6, no. 87, 2025. <a target="_blank" href="https://doi.org/10.1007/s42979-024-03509-6">https://doi.org/10.1007/s42979-024-03509-6</a></p>
]]></content:encoded></item><item><title><![CDATA[[Paper Review] Training a Helpful and Harmless Assistant withReinforcement Learning from Human Feedback]]></title><description><![CDATA[Since I have joined a team which deals with AI and LLMs, I have decided to review a paper in relation to an LLM which deals with reinforcement learning of LLM and how it turns out to be better than the zero-shot learning.
It had been only 3 days in t...]]></description><link>https://ramieeee.me/paper-review-training-a-helpful-and-harmless-assistant-with-reinforcement-learning-from-human-feedback</link><guid isPermaLink="true">https://ramieeee.me/paper-review-training-a-helpful-and-harmless-assistant-with-reinforcement-learning-from-human-feedback</guid><category><![CDATA[Reinforcement Learning]]></category><category><![CDATA[llm]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Sun, 25 May 2025 08:04:45 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748160260606/de66987d-98f3-4fb6-bfdd-937f4029b4bc.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Since I have joined a team which deals with AI and LLMs, I have decided to review a paper in relation to an LLM which deals with reinforcement learning of LLM and how it turns out to be better than the zero-shot learning.</p>
<p>It had been only 3 days in the team since I joined the team, but I myself needed to figure out my pathway with which skills I will develop and carry.</p>
<p>There are tons and loads of AI domains I would like to get involved in, like algorithms making or linear regression research or LLMs, but since our team currently focuses on LLM and fine-tuning it, I decided to further study how to evaluate the LLM outputs and reinforcement-learn it.</p>
<p>OpenAI team Anthropic conducted a research on Reinforcement Learning from Human Feedback(RLHF) and it was intriguing.</p>
<h1 id="heading-1-introduction">1. Introduction</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748156668382/475b589f-0506-46ac-9c7b-e215be04dca6.png" alt class="image--center mx-auto" /></p>
<p>Learning from Human Feedback would mean nothing much difficult to understand. There are two datasets sorted by humans, and a model is fine-tuned.</p>
<h3 id="heading-datasets">Datasets</h3>
<ul>
<li><p>Helpfulness</p>
<ul>
<li>answering, writing, editing, documents etc.</li>
</ul>
</li>
<li><p>Harmlessness</p>
<ul>
<li>not related to harmful goals like bank rubbery</li>
</ul>
</li>
</ul>
<p>These two datasets will be categorised by humans and it is totally up to the crowdworkers to decide which category the text falls into.</p>
<h3 id="heading-trade-off">Trade-off</h3>
<p>An interesting result though, when the model learns from a single dataset, which is either the helpful dataset or the harmless dataset, it has got a tendency to show the trade-offs in the scores.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748157548045/0970820f-abc7-4a54-97cd-b6bd61e203f2.png" alt class="image--center mx-auto" /></p>
<p>As shown above, the green triangle plot(Online Helpful RLHF) scores top with Elo score on the left, yet on the right it is least preferred by crowdworkers when scoring for harmlessness. Technically, a bias is formed when only a single dataset is used for training the model.</p>
<p>When trained with the two datasets, both helpful and harmless datasets, it shows a meaningful result that the few-shot accuracy shows better performance than the zero-shot accuracy in general NLP performance tests.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748158086082/499df7af-b7f1-400c-b76f-1999eab58f5a.png" alt class="image--center mx-auto" /></p>
<p>The graphs indicate that the bigger models the better performance the models show in general.</p>
<h1 id="heading-2-suggested-rlhf">2. Suggested RLHF</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748158384034/cc097bb3-f3c1-4d20-832d-633b4e68dfb5.png" alt class="image--center mx-auto" /></p>
<p>This is the process of RLHF from data collection to reinforcement learning. It was somehow complicated for me to understand the whole, so I had to summarise for my own the entire process according to my understanding. There are approximately three steps in a nutshell as below.</p>
<h3 id="heading-1-ab-test">1) AB test</h3>
<ul>
<li>crowdworker determines the outputs.</li>
</ul>
<h3 id="heading-2-preference-model">2) Preference Model</h3>
<ul>
<li><p>input: prompt, AB answers, preferred answer(human feedback)</p>
</li>
<li><p>output: scores</p>
</li>
</ul>
<h3 id="heading-3-rlhf">3) RLHF</h3>
<ul>
<li>input: prompt, RLHF answer, PM score</li>
</ul>
<h1 id="heading-3-evaluation">3. Evaluation</h1>
<p>The standard NLP test methods are applied to check the model, but as mentioned above, when the models are fine-tuned with the datasets (helpful and harmless), the bias is formed.</p>
<p>Test methods are as below.</p>
<ul>
<li><p>MMMLU: Benchmark for many domains with high-level questions (history, law, medicine, etc.)</p>
</li>
<li><p>LAMBDA: A task to predict the last word</p>
</li>
<li><p>HellaSwag: A task to choose an appropriate context</p>
</li>
<li><p>OpenBookQA: Basic science knowledge</p>
</li>
<li><p>ARC-Easy: Basic science knowledge (easy questions)</p>
</li>
<li><p>ARC-Challenge: higher-level questions which requires the reasoning</p>
</li>
<li><p>TriviaQA: common sense questions collected on the internet</p>
</li>
</ul>
<h1 id="heading-4-reflection">4. Reflection</h1>
<p>OpenAI team indicated that the model shows better performance in helpful scores, when evaluated by humans, but they are uncertain of the reason why it is like this. They assume it is because of the datasets, but further research is required to prove the datasets lack the correctness for fine-tuning.</p>
<h3 id="heading-evaluation">Evaluation</h3>
<p>It was always questioning me when I have to consider how to assess the model in a specific domain when an LLM is applied. Should a human give a feedback to the answers the LLM has given? Well, this paper at least had provided the information that humans (crowdworkers) did the job to divide datasets. From this research I could get the idea that human feedback is required and essential in output evaluation, although it is time consuming and requires budgets, because human force is the most expensive resource though.</p>
<h3 id="heading-datasets-1">Datasets</h3>
<p>Dataset from the range of 100-500k will be ideal for fine-tuning, I though. The paper suggests the static datasets and online datasets, and the paper was released in 2022, which I think there could be better ideas and methods on how to make datasets using LLMs and evaluate them.</p>
<h1 id="heading-5-reference">5. Reference</h1>
<p>[1] Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., et al. (2022). <em>Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback</em>. arXiv preprint arXiv:2204.05862.</p>
]]></content:encoded></item><item><title><![CDATA[[Paper Review] Object Recognition and Positioning with Neural Networks: Single Ultrasonic Sensor Scanning Approach]]></title><description><![CDATA[It is not all of a sudden that I was intrigued by the subject Ultrasonic scanning to detect objects or classify the objects. I once randomly thought about what I would choose for PhD paper if I had to go through PhD course, then ultrasound was the ke...]]></description><link>https://ramieeee.me/paper-review-object-recognition-and-positioning-with-neural-networks-single-ultrasonic-sensor-scanning-approach</link><guid isPermaLink="true">https://ramieeee.me/paper-review-object-recognition-and-positioning-with-neural-networks-single-ultrasonic-sensor-scanning-approach</guid><category><![CDATA[coordinates]]></category><category><![CDATA[Ultrasonic]]></category><category><![CDATA[#waves]]></category><category><![CDATA[classification]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Sun, 27 Apr 2025 10:59:41 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1745751548846/89d4c5d5-83df-4efa-95f7-3cec2c1f283e.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It is not all of a sudden that I was intrigued by the subject <em>Ultrasonic</em> scanning to detect objects or classify the objects. I once randomly thought about what I would choose for PhD paper if I had to go through PhD course, then ultrasound was the key idea that popped into my head. To strengthen the idea I was on the act to search the webs for papers then I found this. My idea about <em>ultrasound</em> was this.</p>
<ul>
<li><p>has to be able to detect objects</p>
</li>
<li><p>reads the environment and could tell it by the signal</p>
</li>
<li><p>related to scanning the environment and able to visualize it</p>
</li>
<li><p>low cost</p>
</li>
</ul>
<p>Of course low cost for my personal project but it would be a different story if I were a part of the university in PhD course, which I am not, thus the cost matters.</p>
<p>I had a thorough review on this paper written by Turkish people, and according to this paper the authors belong to Defense Industries Research and Development Institute of Turkey, which gave me a belief that this paper would be meticulous and analytic.</p>
<h1 id="heading-1-introduction">1. Introduction</h1>
<h3 id="heading-why-ultrasonic-sensor">Why Ultrasonic sensor?</h3>
<ul>
<li><p>Low cost compared to LIDAR</p>
<p>  The most and critical reason for utilizing ultrasonic sensing is because of the cost. Compared to ultrasonic sensor, other sensors are quite heavy in terms of price, LIDAR for example is a great sensor with a stunning visualization and detecting objects around, yet is high in price and individuals have impediments to access to this.</p>
</li>
<li><p>Useful when optical sensing is not possible</p>
<p>  LIDAR, again, has the strength when the light can travel to objects. However, when optical sensing is disturbed by conditions such as weather or other factors, it may not be able to collect information from objects as there will be noise and the sensing tasks will not be fulfilled completely.</p>
</li>
</ul>
<h3 id="heading-aim">Aim</h3>
<p>The paper aims to contribute to provide with the methodologies of 1) object classification and 2) coordinate estimation with a single ultrasonic sensor, so to be utilized on robotic or human applications such as helmets scanning objects in real time</p>
<h1 id="heading-2-proposed-methods">2. Proposed methods</h1>
<h3 id="heading-data-collection">Data Collection</h3>
<p>The authors placed the sensor on 3D printer for automated data collection. Every 2mm in X direction of the 3D printer, the data will be collected through ultrasonic sensor with 116 different scenarios.</p>
<h3 id="heading-dataset">Dataset</h3>
<p>3 objects with 3 classes were to be used for dataset: large object (40mm), medium object (20mm) and narrow object (10 mm)</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745751384243/7c1ec74c-7429-4357-b714-0fbd7f4b0540.png" alt class="image--center mx-auto" /></p>
<p>The process of the ultrasound sensor is as below.</p>
<ol>
<li><p>Transmitter sends the signal to objects</p>
</li>
<li><p>Receiver takes the signal and process the signal through amplifier</p>
</li>
<li><p>Digitalize the signal</p>
</li>
<li><p>Send digital data to USB</p>
</li>
</ol>
<h3 id="heading-algorithm">Algorithm</h3>
<p>CNN algorithm was applied to process the signal information. This was not a multi-modal model. Classic CNN model consists of 3 of 2D convolutional layers and 3 of max pooling layers, then flatten layer. Dense layer will take the vector feature and softmax will classify objects, and another dense layer will solve the linear regression for coordinates estimation.</p>
<h3 id="heading-preprocessing">Preprocessing</h3>
<p>Data was collected by placing the objects at certain position in y-axis, and random between 0 - 40 cm in x-axis and measure every 2mm with ultrasonic sensor. 5mm is too much and miss out some information. The data then will look like a fluctuation on 2D graph where the object reflection is detected. Then envelope of the signal will be extracted from the raw data. When the two features are combined, a pillar-looking data shows on graph, seemingly very disheveled with a lot of noise.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745751412822/73e30f45-9b5e-4e10-b1cd-1c5f4d997f30.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745751420906/ef524672-f44d-41fa-9277-b11d3db5936d.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745751436083/89293072-8400-4e88-8a75-6ef6383af71c.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-normalization">Normalization</h3>
<p>There are two normalization process.</p>
<ul>
<li><p>0-255 to 0-1 scale</p>
<p>  This is necessary on image preprocessing so the model will take inputs with values from 0 to 1 and will not have big values to save memory of the machine as well as to prevent from overfitting as the value could grow intimidatingly when MSE is applied for error calculation.</p>
</li>
<li><p>Change input size and and change to grayscale</p>
<p>  Size of the picture should be in a certain format such as 12 x 12 scale or other so the model will take the input accordingly. And also grayscale was applied to this algorithm.</p>
</li>
</ul>
<h1 id="heading-3-key-point">3. Key Point</h1>
<p>If divided into big categories, they will be as below.</p>
<ul>
<li><p>Data collection</p>
</li>
<li><p>Data processing</p>
</li>
<li><p>CNN to determine values</p>
</li>
</ul>
<p>It focuses on CNN algorithm to perform multiple tasks and this differentiates the paper from other researches. Traditional CNN could perform such tasks with ultrasonic image data.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745751480387/d87c259b-a676-4d17-b10e-c1f408f7c90c.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-4-reflection">4. Reflection</h1>
<p>The paper indeed is outstanding in details of how the data was prepared and the processing was done. It may be a barrier though, that if there is no knowledge about the sound waves, or graphs related to waves, and also the devices used for wave detection, this would be quite a handful paper to review. The paper introduces the modules and models of devices. Various kinds of these were new to me as I am not familiar with the sound wave or anything related to this domain.</p>
<p>I would need to see many studies of ultrasonic about environmental conditions like roads, metals or materials</p>
<h3 id="heading-improvements">Improvements</h3>
<ul>
<li><p><strong>Score</strong></p>
<p>  The model itself was great in detecting objects and estimating the coordinates of objects. However, multiple object detection F1-score was unexpectedly low, although the first object showed over 90%. From the second object to multiple objects, the score would significantly decrease that some would score 79% in F1-score. The score could be said the model is adequately reliable with such performance, but also the score shows that there is a handful amount of error that this may not be appropriate for adaption in industries.</p>
</li>
<li><p><strong>Number of Sensor</strong></p>
<p>  The aim of the paper is on the single-sensor algorithm. This was brilliant enough to show the possibility of object detection using only one ultrasonic sensor. However it has limitation when addressing many objects with a single sensor. As mentioned in the paper when Gaussian noise and salt-and-pepper noise were applied to the test dataset for real-world applications, the score would drop up to 70% for the third object. I strongly believe that using more sensors to this research will improve the quality of image processing and sensing data from the sensors.</p>
</li>
<li><p><strong>Test method</strong></p>
<p>  I noticed that the test was conducted by placing objects at a certain y-axis, which is not at random points. This could be the limitation of the suggested method. I would be more reliable if the test was conducted with various position for x, y axis.</p>
</li>
</ul>
<h1 id="heading-5-reference">5. Reference</h1>
<p>[1] Karagoz A., Dindis G., <em>Object Recognition and Positioning with Neural Networks: Single Ultrasonic Sensor Scanning Approach</em>, Sensors 2025, 25(4), 1086, <a target="_blank" href="https://doi.org/10.3390/s25041086">https://doi.org/10.3390/s25041086</a> (CC BY 4.0)</p>
]]></content:encoded></item><item><title><![CDATA[[Paper Review] DeepSeek-R1 Incentivizing Reasoning Capability in LLMs via Reinforcement Learning]]></title><description><![CDATA[It has not been a long time since DeepSeek was released. It was indeed a shock to those who are in AI industry.
I was not familiar with LLM’s algorithm and the computing resource usage of the LLMs. All I was doing was to utilise the LLM APIs for deve...]]></description><link>https://ramieeee.me/paper-review-deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning</link><guid isPermaLink="true">https://ramieeee.me/paper-review-deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning</guid><category><![CDATA[Deepseek]]></category><category><![CDATA[llm]]></category><category><![CDATA[Reinforcement Learning]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Sun, 20 Apr 2025 12:11:18 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1745151315420/5c785e0c-f3b5-4df1-97fa-a420779b9422.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It has not been a long time since DeepSeek was released. It was indeed a shock to those who are in AI industry.</p>
<p>I was not familiar with LLM’s algorithm and the computing resource usage of the LLMs. All I was doing was to utilise the LLM APIs for developers to build pipelines for automation. Other than that, nothing much was to consider.</p>
<p>When DeepSeek stroke the LLM industry, people eagerly talked about the algorithm and how light the model is compared to the other models such as GPT from OpenAI and Llama from Meta.</p>
<p>I would like to meticulously analyze DeepSeek, especially how it was designed to distill with open source models and how they used dataset to train the model.</p>
<h1 id="heading-1-abstract">1. Abstract</h1>
<p>In the paper, two reasoning models are introduced: DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero seemed to be a model before supervised fine-tuning. In the abstract , two models are summarised with features below.</p>
<ul>
<li><p><strong>DeepSeek-R1-Zero</strong></p>
<ul>
<li><p>without suvervised fine-tunining(SFT)</p>
</li>
<li><p>pros</p>
<ul>
<li><p>remarkable reasoning capabilities</p>
</li>
<li><p>naturally emerges with numerous powerful and intriguing reasoning behaviors</p>
</li>
</ul>
</li>
<li><p>cons</p>
<ul>
<li><p>poor readability (human cannot understand the output)</p>
</li>
<li><p>poor language mixing</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p><strong>DeepSeek-R1</strong></p>
<ul>
<li><p>developed to overcome the shortcomings of DeepSeek-R1-Zero</p>
</li>
<li><p>incorporates multi-stage training and cold-start data before RL.</p>
</li>
<li><p>comparable to OpenAI-o1-1217</p>
</li>
</ul>
</li>
</ul>
<hr />
<p>According to the paper, the benchmark performance of DeepSeek-R1 surpasses OpenAI-o1-mini and show a similar performance with OpenAI-o1-1217. The performance metrics are as below in the table.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>AIME2024</td><td>Codeforces</td><td>GPQA Diamonds</td><td>MATH-500</td><td>MMLU</td><td>SWE-bench Verified</td></tr>
</thead>
<tbody>
<tr>
<td>DeepSeek-R1</td><td>79.8</td><td>96.3</td><td>71.5</td><td>93.7</td><td>90.8</td><td>49.2</td></tr>
<tr>
<td>OpenAI-o1-1217</td><td>79.2</td><td>96.6</td><td>75.7</td><td>96.4</td><td>91.8</td><td>48.9</td></tr>
<tr>
<td>DeepSeek-R1-32B</td><td>72.6</td><td>90.6</td><td>62.1</td><td>94.3</td><td>87.4</td><td>36.8</td></tr>
<tr>
<td>OpenAI-o1-mini</td><td>63.6</td><td>93.4</td><td>60.0</td><td>90.0</td><td>85.2</td><td>41.6</td></tr>
<tr>
<td>DeepSeek-V3</td><td>39.2</td><td>58.7</td><td>59.1</td><td>90.2</td><td>88.5</td><td>42.0</td></tr>
</tbody>
</table>
</div><h1 id="heading-2-introduction">2. Introduction</h1>
<p>In the paper, it kicks off by mentioning that OpenAI o1 was the first to be introduced for reasoning tasks but the effective test-time scaling is still challenged, and other works were not comparable to o1 models.</p>
<ul>
<li><p>Why was DeepSeek R1 model was developed?</p>
<ul>
<li>To improve language model reasoning capabilities with RL(reinforcement learning)</li>
</ul>
</li>
<li><p>How?</p>
<ul>
<li><p>DeepSeek-R1-Zero: DeepSeek-V3-Base's output pass through GRPO, to update parameters in accordance with the scores</p>
</li>
<li><p>DeepSeek-R1: RL and cold-start data, and multi stage training pipeline</p>
</li>
</ul>
</li>
</ul>
<p>Basically, this paper emphasises on the fact that LLM could be optimised and incentivised through RL without supervised fine-tuning(SFT)</p>
<h1 id="heading-3-approach">3. Approach</h1>
<p>In chapter 2, DeepSeek team mentions the algorithm, Group Relative Policy Optimisation (GRPO). This algorithm is utilised to train the model by optimising outputs, which consists of new policy and old policy. Also this paper says critic model is forgone. It is the same size as the policy model, which I think is one of the factors that enables the DeepSeek-R1-Zero to be lighter by reducing a significant amount of computational process.</p>
<p>GRPO equation looks as below.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743841353112/95ca054a-d713-4f37-a29d-b9804038be5c.png" alt class="image--center mx-auto" /></p>
<p>The gist of the algorithm is to sample outputs from old policy and calculate possibilities. The ratio of new policy’s output and old policy’s output is then <em>clipped</em> within the range of \( 1-\\epsilon, 1+\\epsilon \) to prevent from getting too much bias either on the new policy or the old policy.</p>
<h3 id="heading-rewards">Rewards</h3>
<p>There are two reward methods to which the model resort. The model will take the rewards when the output scores meaningfully. It leverages the rewards to update its weight in a direction where the score is evaluated good, whereas the less score will be then refined or ablated so only the useful scores remain.</p>
<ul>
<li><p>accuracy rewards: it evaluates the response like math problem results or classification results.</p>
</li>
<li><p>format rewards: it enforces the model to have its reasoning process in the tags like '&lt;think&gt;&lt;/think&gt;'</p>
</li>
</ul>
<h3 id="heading-how-test-was-done">How Test Was Done</h3>
<ul>
<li>AIME accuracy: For each question, 16 answers were selected and the overall average accuracy was calculated</li>
</ul>
<h1 id="heading-4-key-point">4. Key Point</h1>
<p>DeepSeek-R1-Zero model showed that it actually is autonomously learning. The more learning steps it takes, the more time it takes to response, which means it thinks more before responding.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743947698073/8f2f53b5-25e7-4742-a8ab-36b761f5abca.png" alt class="image--center mx-auto" /></p>
<p>By its output after GRPO calculation, the model learns itself with knowledge it spits and takes it back to strengthen what it is sure about. It even reaches to the point where there is an "aha" moment, a moment when it realises itself what the knowledge should be or meant to be.</p>
<p>Yet still, DeekSeek-R1-Zero's reasoning process has the drawback of <em>poor readability</em>. If humans cannot understand the outputs from its weights, it will be useless. This is the reason why DeepSeek-R1 is designed to perform with robust readability.</p>
<h3 id="heading-deepseek-r1">DeepSeek-R1?</h3>
<p>DeepSeek-R1 model takes a few data to be supervised-fine-tuned with the sampling data from DeepSeek-R1-Zero's output. Human annotators would filter and sample the data to be utilised or not utilised for DeepSeek-R1 model to be enforced to have better readability and reasoning ability. Approximately 600k reasoning related training samples and 200k unreasoning training samples were collected to feed the model.</p>
<h3 id="heading-process-in-summary">Process in Summary</h3>
<p>The process in this paper for reinforcement learning of the models is summarised as below.</p>
<ul>
<li>DeepSeek-V3-Base -&gt; DeepSeek-R1-Zero -&gt; DeepSeek-R1(with Zero's data for cold-start) -&gt; DeepSeek-R1(with RL as Zero model did) -&gt; Distillation</li>
</ul>
<h3 id="heading-how-test-was-done-1">How Test Was Done</h3>
<p>An intriguing part is the test part. There are many test methods for LLMs performance in unique domains like math, reasoning and others. The paper claims that 16 answers were selected and the overall average accuracy was calculated.</p>
<h1 id="heading-5-reflection">5. Reflection</h1>
<p>This was my first LLM paper review. I felt awkward with the concepts of terms used in the paper like checkpoint, RL for LLM, rewards and so on. Quite a challenging moment it had been, and motivating to read the paper. It was also thrilling that the model could learn by itself by choosing rewards and taking the right output so it could develop its own universe of tensor. Once I had a doubt that the methodology of developing LLMs with vectors and tensors could be wrong and there are other ways to have a better performing models, but then realised that the numbers and floats in tensors are the best efficient way to store and calculate to reflect LLM and maybe it is by the nature meant to be this way. <em>Number</em> is the only way for now to express a data from a cell though.</p>
<p>Still there is a limitation that the model shows its robust part in English and Chinese since the base model was trained mainly with these two languages, but on the other hand, it was only the two languages it performed fancy. The paper mentions that the future work is to add more languages to the model so it is not restricted in only a few languages.</p>
<p>My personal interest in LLM, if there should be one, is making a sLLM, with low computational resource requirements, faster performance so it could be adopted in any circumstances whether it is on hardwares like embedding or any domains.</p>
<h1 id="heading-6-reference">6. Reference</h1>
<p>[1] DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., et al. (2024). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv preprint arXiv:2501.12948</p>
]]></content:encoded></item><item><title><![CDATA[[LeetCode] Top Interview 150 Problems Solving # 80 Remove Duplicates from Sorted Array II]]></title><description><![CDATA[Understanding the Problem
An integer list of nums has numbers that could be either duplicate or non-duplicate. It is to leave 1-2 values for each number in the list in-place, which means it is nothing to do with returning any value from the method.
#...]]></description><link>https://ramieeee.me/leetcode-top-interview-150-problems-solving-80-remove-duplicates-from-sorted-array-ii</link><guid isPermaLink="true">https://ramieeee.me/leetcode-top-interview-150-problems-solving-80-remove-duplicates-from-sorted-array-ii</guid><category><![CDATA[leetcode]]></category><category><![CDATA[string]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Wed, 02 Apr 2025 04:31:47 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1743522203673/04276cb7-7df4-481b-91df-ba9a5d4a4f3f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-understanding-the-problem"><strong>Understanding the Problem</strong></h1>
<p>An integer list of <code>nums</code> has numbers that could be either duplicate or non-duplicate. It is to leave 1-2 values for each number in the list in-place, which means it is nothing to do with returning any value from the method.</p>
<pre><code class="lang-python"><span class="hljs-comment"># example 1</span>
Input: nums = [<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>]
Output: nums = [<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>] <span class="hljs-comment"># it is not a returned value. Must fix nums list from the parameter</span>
</code></pre>
<h1 id="heading-approach"><strong>Approach</strong></h1>
<p>I had an experience of removing duplicates from the given list. It was pretty much the same though. But this time it allowed each unique element to have 2 duplicates.</p>
<p>I was going to have the unique values with <code>set()</code> method and iterate from this.</p>
<h1 id="heading-solution"><strong>Solution</strong></h1>
<p>The problem solving steps are as below.</p>
<ol>
<li><p>I get the unique values into <code>unique_n</code> with <code>set(nums)</code></p>
</li>
<li><p>Then I would check each element with <code>n</code></p>
</li>
<li><p>If <code>n</code> has more duplicates than 2, remove in the <code>while</code> loop</p>
</li>
</ol>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">removeDuplicates</span>(<span class="hljs-params">self, nums: List[int]</span>) -&gt; int:</span>
        unique_n = set(nums)
        <span class="hljs-keyword">for</span> n <span class="hljs-keyword">in</span> unique_n:
            <span class="hljs-keyword">while</span> nums.count(n) &gt; <span class="hljs-number">2</span>:
                nums.remove(n)
</code></pre>
<h1 id="heading-reflection"><strong>Reflection</strong></h1>
<p>This solution is not the best I could tell as the time complexity scored 4017ms! An amusing runtime number I had not seen before. There are two loops <code>for</code> and <code>while</code>, which will be close to <em>O(n²)</em> at worst.</p>
<p>People had it done in a different way with double pointer.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">removeDuplicates</span>(<span class="hljs-params">self, nums: List[int]</span>) -&gt; int:</span>

        k = <span class="hljs-number">2</span>

        <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">2</span>, len(nums)):
            <span class="hljs-keyword">if</span> nums[i] != nums[k - <span class="hljs-number">2</span>]:
                nums[k] = nums[i]
                k += <span class="hljs-number">1</span> 

        <span class="hljs-keyword">return</span> k
</code></pre>
<p>With this solution <code>k-2</code>th and <code>i</code>th index values are compared. The code could be visualised step by step like this.</p>
<pre><code class="lang-python"><span class="hljs-comment"># step 1</span>
nums = [<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]
 idx =&gt; <span class="hljs-number">0</span>  <span class="hljs-number">1</span>  <span class="hljs-number">2</span>  <span class="hljs-number">3</span>  <span class="hljs-number">4</span>  <span class="hljs-number">5</span>
        i     k
<span class="hljs-comment"># i is 0 and k is at 2</span>
<span class="hljs-comment"># nums[i] != nums[k - 2] this condition is False</span>

<span class="hljs-comment"># step 2</span>
nums = [<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]
 idx =&gt; <span class="hljs-number">0</span>  <span class="hljs-number">1</span>  <span class="hljs-number">2</span>  <span class="hljs-number">3</span>  <span class="hljs-number">4</span>  <span class="hljs-number">5</span>
           i  k
<span class="hljs-comment"># i is 1 and k is at 2</span>
<span class="hljs-comment"># nums[i] != nums[k - 2] this condition is False</span>

<span class="hljs-comment"># step 3</span>
nums = [<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]
 idx =&gt; <span class="hljs-number">0</span>  <span class="hljs-number">1</span>  <span class="hljs-number">2</span>  <span class="hljs-number">3</span>  <span class="hljs-number">4</span>  <span class="hljs-number">5</span>
              i
              k
<span class="hljs-comment"># i is 2 and k is at 2</span>
<span class="hljs-comment"># nums[i] != nums[k - 2] this condition is False</span>

<span class="hljs-comment"># step 4</span>
nums = [<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]
 idx =&gt; <span class="hljs-number">0</span>  <span class="hljs-number">1</span>  <span class="hljs-number">2</span>  <span class="hljs-number">3</span>  <span class="hljs-number">4</span>  <span class="hljs-number">5</span>
              k  i
<span class="hljs-comment"># i is 3 and k is at 2</span>
<span class="hljs-comment"># nums[i] != nums[k - 2] this condition is True</span>
<span class="hljs-comment"># nums[k] is updated, and k is updated</span>
nums = [<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">2</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]
 idx =&gt; <span class="hljs-number">0</span>  <span class="hljs-number">1</span>  <span class="hljs-number">2</span>  <span class="hljs-number">3</span>  <span class="hljs-number">4</span>  <span class="hljs-number">5</span>
                 k  i
</code></pre>
<p>This process goes until the end. It is quite a tricky algorithm without visualisation. With double pointers the time complexity will be close to <em>O(n).</em></p>
]]></content:encoded></item><item><title><![CDATA[[LeetCode] Top Interview 150 Problems Solving # 151 Reverse Words in a String]]></title><description><![CDATA[Understanding the Problem
A string is given with spaces. The string could be a single character or a complete sentence. It is to reverse the string by words. The spaces at the edges should be cut off.
# example 1
Input: s = "the sky is blue"
Output: ...]]></description><link>https://ramieeee.me/leetcode-top-interview-150-problems-solving-151-reverse-words-in-a-string</link><guid isPermaLink="true">https://ramieeee.me/leetcode-top-interview-150-problems-solving-151-reverse-words-in-a-string</guid><category><![CDATA[#Slicing]]></category><category><![CDATA[Python]]></category><category><![CDATA[leetcode]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Fri, 28 Mar 2025 00:33:41 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1743121999694/f2608574-4623-4dc3-b7a0-040bf70192eb.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-understanding-the-problem"><strong>Understanding the Problem</strong></h1>
<p>A string is given with spaces. The string could be a single character or a complete sentence. It is to reverse the string by words. The spaces at the edges should be cut off.</p>
<pre><code class="lang-python"><span class="hljs-comment"># example 1</span>
Input: s = <span class="hljs-string">"the sky is blue"</span>
Output: <span class="hljs-string">"blue is sky the"</span>

<span class="hljs-comment"># example 2</span>
Input: s = <span class="hljs-string">"  the sky is blue  "</span>
Output: <span class="hljs-string">"blue is sky the"</span>
</code></pre>
<h1 id="heading-approach"><strong>Approach</strong></h1>
<p>I would first make it a list, the reverse it. Quite a simple problem it seems.</p>
<h1 id="heading-solution"><strong>Solution</strong></h1>
<p>I made the string a list by <code>split()</code> so it will automatically remove all the spaces and make a word list. <code>s_list</code> below will have the value <code>s_list = [“the”, ”sky”, ”is”, ”blue”]</code>. Afterwards, I reversed the list with <code>s_list[::-1]</code>. This index sorting means it will be reversed with the value <code>-1</code>. In slicing index has got the sequence as [start:stop:step]. When the step has a negative value, this means in a reverse order, so in our case it was to select every elements by emptying start and stop, then reverse it with <code>-1</code> to enumerate every element.</p>
<p>When the list is reversed, then I simply joined the list.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">reverseWords</span>(<span class="hljs-params">self, s: str</span>) -&gt; str:</span>
        s_list = s.split()
        <span class="hljs-keyword">return</span> <span class="hljs-string">" "</span>.join(s_list[::<span class="hljs-number">-1</span>])
</code></pre>
<h1 id="heading-reflection"><strong>Reflection</strong></h1>
<p>If I had no knowledge about the internal methods like <code>split()</code>, <code>join()</code> and slicing technique, this problem would’ve been a conundrum. Luckily my answer had the time complexity of 0ms.</p>
]]></content:encoded></item><item><title><![CDATA[[LeetCode] Top Interview 150 Problems Solving # 392 Is Subsequence]]></title><description><![CDATA[Understanding the Problem
Given strings s and t, it is to tell whether s is the subsequence to t. It is not finding if the characters in s exist in t. Well basically it is, but the sequence matters too.
# example
Input: s = "abc", t = "ahbgdc"
Output...]]></description><link>https://ramieeee.me/leetcode-top-interview-150-problems-solving-392-is-subsequence</link><guid isPermaLink="true">https://ramieeee.me/leetcode-top-interview-150-problems-solving-392-is-subsequence</guid><category><![CDATA[subsequence]]></category><category><![CDATA[leetcode]]></category><category><![CDATA[Python]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Wed, 26 Mar 2025 10:50:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1742995570790/d51c4c73-cf23-413f-8cca-ef2da016a252.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-understanding-the-problem"><strong>Understanding the Problem</strong></h1>
<p>Given strings <code>s</code> and <code>t</code>, it is to tell whether <code>s</code> is the subsequence to <code>t</code>. It is not finding if the characters in <code>s</code> exist in <code>t</code>. Well basically it is, but the sequence matters too.</p>
<pre><code class="lang-python"><span class="hljs-comment"># example</span>
Input: s = <span class="hljs-string">"abc"</span>, t = <span class="hljs-string">"ahbgdc"</span>
Output: true
</code></pre>
<h1 id="heading-approach"><strong>Approach</strong></h1>
<p>This would require a <code>for</code> loop to check each character. The loop will go through <code>t</code> as it may or may not contain <code>s</code>.</p>
<h1 id="heading-solution"><strong>Solution</strong></h1>
<p>I have declared <code>idx</code> and initialized it with 0, to start from the first index of <code>s</code>. If only <code>c</code> from the loop of <code>t</code> is identical to <code>s[idx]</code>, add up one to <code>idx</code> to move to the next sequence of <code>s</code>.</p>
<p>But it needs checking if idx is at the end of <code>s</code>. If it is at the end of <code>s</code>, there is no further need for checking the characters anymore, thus just return <code>True</code>.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">isSubsequence</span>(<span class="hljs-params">self, s: str, t: str</span>) -&gt; bool:</span>
        idx = <span class="hljs-number">0</span>
        <span class="hljs-keyword">for</span> c <span class="hljs-keyword">in</span> t:
            <span class="hljs-keyword">if</span> idx &gt;= len(s):
                <span class="hljs-keyword">return</span> <span class="hljs-literal">True</span>
            <span class="hljs-keyword">elif</span> c == s[idx]:
                idx += <span class="hljs-number">1</span>

        <span class="hljs-keyword">if</span> idx == len(s):
            <span class="hljs-keyword">return</span> <span class="hljs-literal">True</span>
        <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>
</code></pre>
<h1 id="heading-reflection"><strong>Reflection</strong></h1>
<p>The fact that I was trying to solve this with <code>sort()</code> or <code>set()</code> was a bit hilarious, but it was worth trying to see many test cases that went wrong with my code. It had 0ms time complexity which I am satisfied with.</p>
]]></content:encoded></item><item><title><![CDATA[[LeetCode] Top Interview 150 Problems Solving # 242 Valid Anagram]]></title><description><![CDATA[Understanding the Problem
It is to find valid anagram. Anagram is a word, as for this problem, which can be rearranged to be formed another word. In this problem, s and t strings are given with characters each, and they can form exactly the same word...]]></description><link>https://ramieeee.me/leetcode-top-interview-150-problems-solving-242-valid-anagram</link><guid isPermaLink="true">https://ramieeee.me/leetcode-top-interview-150-problems-solving-242-valid-anagram</guid><category><![CDATA[leetcode]]></category><category><![CDATA[Valid Anagram]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Tue, 25 Mar 2025 05:39:12 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-understanding-the-problem"><strong>Understanding the Problem</strong></h1>
<p>It is to find valid anagram. Anagram is a word, as for this problem, which can be rearranged to be formed another word. In this problem, <code>s</code> and <code>t</code> strings are given with characters each, and they can form exactly the same words or not. If <code>s</code> can be rearranged to be <code>t</code>, or <code>t</code> can be rearranged to be <code>s</code>, return <code>True</code>. If the characters in <code>s</code> and <code>t</code> are irrelevant, return <code>False</code>.</p>
<pre><code class="lang-python"><span class="hljs-comment"># example 1</span>
Input: s = <span class="hljs-string">"anagram"</span>, t = <span class="hljs-string">"nagaram"</span>
Output: <span class="hljs-literal">True</span>

<span class="hljs-comment"># example 2</span>
Input: s = <span class="hljs-string">"anagram"</span>, t = <span class="hljs-string">"annagrmm"</span>
Output: <span class="hljs-literal">False</span>
</code></pre>
<h1 id="heading-approach"><strong>Approach</strong></h1>
<p>I would solve the problem by simply sorting the two strings and comparing them.</p>
<h1 id="heading-solution"><strong>Solution</strong></h1>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">isAnagram</span>(<span class="hljs-params">self, s: str, t: str</span>) -&gt; bool:</span>
        <span class="hljs-comment"># 20ms runtime</span>
        s_sorted = sorted(s)
        t_sorted = sorted(t)

        <span class="hljs-keyword">if</span> s_sorted != t_sorted:
            <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>
        <span class="hljs-keyword">return</span> <span class="hljs-literal">True</span>
</code></pre>
<p>This was simple but it took 20ms runtime as the sorting logic used twice in the code was causing a sort of delayed runtime. So I would write down the code again to make it run faster.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">isAnagram</span>(<span class="hljs-params">self, s: str, t: str</span>) -&gt; bool:</span>
        <span class="hljs-comment"># 7ms runtime</span>
        checked = []
        <span class="hljs-keyword">for</span> c <span class="hljs-keyword">in</span> s:
            <span class="hljs-keyword">if</span> c <span class="hljs-keyword">in</span> checked:
                <span class="hljs-keyword">continue</span>
            <span class="hljs-keyword">elif</span> s.count(c) == t.count(c):
                checked.append(c)
            <span class="hljs-keyword">else</span>:
                <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>
        <span class="hljs-keyword">return</span> <span class="hljs-literal">True</span>
</code></pre>
<p>In this code, it goes into the <code>for</code> loop by each character as <code>c</code>. Each character is counted in <code>s</code> and <code>t</code>, then compare the count. If counts are equal, put this <code>c</code> into <code>checked</code> so the same character will not be considered for checking the count anymore.</p>
<p>This code reduced the runtime by 13ms(20ms → 7ms).</p>
<h1 id="heading-reflection"><strong>Reflection</strong></h1>
<p>Well, I normally do not have to consider the time complexity in easy problems like this, but I would not feel any comfortable with two sorting code lines though. It is better to practice the runtime efficiency for harder problems, I think.</p>
]]></content:encoded></item><item><title><![CDATA[[LeetCode] Top Interview 150 Problems Solving # 205 Isomorphic Strings]]></title><description><![CDATA[Understanding the Problem
There are strings as s and t. The length of s and length of t are equal as len(s) == len(t). This problem requires deciding whether s and t are isomorphic, which means they have the same structure, or pattern.
As an example,...]]></description><link>https://ramieeee.me/leetcode-top-interview-150-problems-solving-205-isomorphic-strings</link><guid isPermaLink="true">https://ramieeee.me/leetcode-top-interview-150-problems-solving-205-isomorphic-strings</guid><category><![CDATA[leetcode]]></category><category><![CDATA[hashmap]]></category><category><![CDATA[dictionary]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Mon, 24 Mar 2025 14:21:11 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-understanding-the-problem"><strong>Understanding the Problem</strong></h1>
<p>There are strings as <code>s</code> and <code>t</code>. The length of <code>s</code> and length of <code>t</code> are equal as <code>len(s) == len(t)</code>. This problem requires deciding whether <code>s</code> and <code>t</code> are isomorphic, which means they have the same structure, or pattern.</p>
<p>As an example, when there are strings <code>s = “add”</code> and <code>t = “egg”</code>, <code>”a”</code> could be substituted by <code>”e”</code>, and <code>”d”</code> could be substituted by <code>”g”</code>. If the pattern <code>”a”</code> → <code>”e”</code> is not met properly, it is not isomorphic anymore, thus return <code>False</code>.</p>
<pre><code class="lang-python"><span class="hljs-comment"># example 1</span>
Input: s = <span class="hljs-string">"add"</span>, t = <span class="hljs-string">"egg"</span>
Output: <span class="hljs-literal">True</span>

<span class="hljs-comment"># example 2</span>
Input: s = <span class="hljs-string">"add"</span>, t = <span class="hljs-string">"ege"</span>
Output: <span class="hljs-literal">False</span>
</code></pre>
<h1 id="heading-approach"><strong>Approach</strong></h1>
<p>This problem was somewhat similar to one of those problems that I solved already. First I would declare <code>d</code> dictionary and one by one add up to the dictionary if the key and value do not exist in it, otherwise return <code>False</code>.</p>
<h1 id="heading-solution"><strong>Solution</strong></h1>
<p>I would go with 1 <code>for</code> loop and another <code>for</code> loop for value checking in the dictionary. It is to use <code>vals</code> in <code>if</code> conditions.</p>
<ul>
<li><p>1st if checks whether key and value are already positioned in <code>d</code>. If they all do not exist in <code>d</code>, simply make a key-value in <code>d</code>.</p>
</li>
<li><p>2nd if checks if key does not exist in <code>d</code>, but value exists in it.</p>
</li>
<li><p>3rd also checks if key exists in <code>d</code> but if the value is not equal to what is already in <code>d</code>.</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">isIsomorphic</span>(<span class="hljs-params">self, s: str, t: str</span>) -&gt; bool:</span>
        d = {}

        <span class="hljs-keyword">for</span> i, j <span class="hljs-keyword">in</span> zip(s, t):
            vals = [v <span class="hljs-keyword">for</span> v <span class="hljs-keyword">in</span> d.values()]

            <span class="hljs-keyword">if</span> i <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> d <span class="hljs-keyword">and</span> j <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> vals:
                d[i] = j
            <span class="hljs-keyword">elif</span> i <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> d <span class="hljs-keyword">and</span> j <span class="hljs-keyword">in</span> vals:
                <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>
            <span class="hljs-keyword">elif</span> i <span class="hljs-keyword">in</span> d <span class="hljs-keyword">and</span> d[i] != j:
                <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>

        <span class="hljs-keyword">return</span> <span class="hljs-literal">True</span>
</code></pre>
<p>The solution above seemed pretty okay but using two <code>for</code> loops was a little disturbing. It took 15ms to run the entire test case. So I have removed <code>vals</code> as <code>for</code> loop, instead I used <code>append()</code> to <code>vals</code> so it will not iterate for every loop.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">isIsomorphic</span>(<span class="hljs-params">self, s: str, t: str</span>) -&gt; bool:</span>
        <span class="hljs-comment"># saved runtime by 11ms!</span>
        d = {}
        vals = []

        <span class="hljs-keyword">for</span> i, j <span class="hljs-keyword">in</span> zip(s, t):
            <span class="hljs-keyword">if</span> i <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> d <span class="hljs-keyword">and</span> j <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> vals:
                d[i] = j
                vals.append(j)
            <span class="hljs-keyword">elif</span> i <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> d <span class="hljs-keyword">and</span> j <span class="hljs-keyword">in</span> vals:
                <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>
            <span class="hljs-keyword">elif</span> i <span class="hljs-keyword">in</span> d <span class="hljs-keyword">and</span> d[i] != j:
                <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>

        <span class="hljs-keyword">return</span> <span class="hljs-literal">True</span>
</code></pre>
<p>This solution passed with 4ms, which is faster by 11ms than the previous one.</p>
<h1 id="heading-reflection"><strong>Reflection</strong></h1>
<p>Hashmap, or dictionary, problem is about checking keys, values, insert and removing and others, I think. But considering the time complex is also important. I would better practice reducing time complex to avoid unnecessary memory usage.</p>
]]></content:encoded></item><item><title><![CDATA[[LeetCode] Top Interview 150 Problems Solving # 35 Search Insert Position]]></title><description><![CDATA[Understanding the Problem
This should be a typical binary search algorithm problem. This problem is meticulous about the time efficiency as this has to be solved with the time complex of O(log n).
There is a number target as an integer. It is to find...]]></description><link>https://ramieeee.me/leetcode-top-interview-150-problems-solving-35-search-insert-position</link><guid isPermaLink="true">https://ramieeee.me/leetcode-top-interview-150-problems-solving-35-search-insert-position</guid><category><![CDATA[leetcode]]></category><category><![CDATA[Binary Search Algorithm]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Sat, 22 Mar 2025 07:59:13 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-understanding-the-problem"><strong>Understanding the Problem</strong></h1>
<p>This should be a typical binary search algorithm problem. This problem is meticulous about the time efficiency as this has to be solved with the time complex of <em>O(log n)</em>.</p>
<p>There is a number <code>target</code> as an integer. It is to find the right spot for <code>target</code> to be in the list <code>nums</code>, where numbers are sorted low to high like <code>[1, 3, 6, 7]</code>. If <code>target = 2</code>, the returned value should be <code>1</code>, as <code>target</code> should be in between 1 and 3, which has the index 1 as the position.</p>
<h1 id="heading-approach"><strong>Approach</strong></h1>
<p>Binary search has not been considered by me. I have always dealt with sorting problems with <code>sorted()</code> function and search problems with <code>index()</code> or <code>find()</code> methods.</p>
<p>My idea was to solve this problem with finding the middle as <code>half</code> and slicing the list down as the loop goes on and on. This was completely a wrong approach, because I would not be able to find the index with it.</p>
<p>The final approach was to update <code>left</code> and <code>right</code> values with index. I had a little help by a LLM though.</p>
<h1 id="heading-solution"><strong>Solution</strong></h1>
<p>The solution with <code>left</code> and <code>right</code> as index approach mitigated the complexity of the code.</p>
<ol>
<li><p>First, <code>left</code> and <code>right</code> was set with index values, the beginning and the end of the list.</p>
</li>
<li><p>In <code>while</code> loop, it will be stopped if <code>left</code> meets <code>right</code> so it will not be processed any further even when <code>left</code> is bigger than <code>right</code></p>
</li>
<li><p>Take <code>half</code> to get the value in the middle. if <code>target</code> is equal to the value <code>nums[half]</code>, return the index value <code>half</code>.</p>
</li>
<li><p>If <code>target</code> is greater than <code>nums[half]</code>, <code>left</code> is updated by adding <code>half + 1</code>. It is to jump the index to the half so the binary approach is applied. <code>+ 1</code> is added though, because of index difference</p>
<ul>
<li><p>For instance, imagine the list is <code>nums = [1, 3, 5, 7]</code> and <code>target</code> is <code>6</code>. In the first loop, <code>half</code> will be <code>(0 + 3) // 2</code>, which will be 1.</p>
</li>
<li><p><code>target</code> is greater than <code>nums[half]</code>, which is 3, so update <code>left</code> with <code>left = 1 + 1</code>, to make it to 2 so it will start from index 2.</p>
</li>
<li><p>Then this time only <code>nums = [5, 7]</code> will be considered.</p>
</li>
</ul>
</li>
<li><p>In other cases, update <code>right</code> to stick with the left half.</p>
</li>
</ol>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">searchInsert</span>(<span class="hljs-params">self, nums: List[int], target: int</span>) -&gt; int:</span>
        left = <span class="hljs-number">0</span>
        right = len(nums) - <span class="hljs-number">1</span>

        <span class="hljs-keyword">while</span> left &lt;= right:
            half = (left + right) // <span class="hljs-number">2</span>
            <span class="hljs-keyword">if</span> target == nums[half]:
                <span class="hljs-keyword">return</span> half
            <span class="hljs-keyword">elif</span> target &gt; nums[half]:
                left = half + <span class="hljs-number">1</span>
            <span class="hljs-keyword">else</span>:
                right = half <span class="hljs-number">-1</span>
        <span class="hljs-keyword">return</span> left
</code></pre>
<h1 id="heading-reflection"><strong>Reflection</strong></h1>
<p>From my perspective, algorithm problems often require searching methods with good time complexity. Binary search with time complexity of <em>O(log n)</em> is very typical and I should be familiar with it. The key is to use index values other than dealing with the actual list and slicing.</p>
]]></content:encoded></item><item><title><![CDATA[[LeetCode] Top Interview 150 Problems Solving # 14 Longest Common Prefix]]></title><description><![CDATA[Understanding the Problem
It is simply to find the common prefix in between the strings in the list strs. strs contains strings like strs = ["flower","flow","flight"]. In this case the common prefix will be ”fl” which is included in every string in c...]]></description><link>https://ramieeee.me/leetcode-top-interview-150-problems-solving-14-longest-common-prefix</link><guid isPermaLink="true">https://ramieeee.me/leetcode-top-interview-150-problems-solving-14-longest-common-prefix</guid><category><![CDATA[leetcode]]></category><category><![CDATA[Python]]></category><category><![CDATA[string]]></category><category><![CDATA[algorithms]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Thu, 20 Mar 2025 08:36:24 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-understanding-the-problem"><strong>Understanding the Problem</strong></h1>
<p>It is simply to find the common prefix in between the strings in the list <code>strs</code>. <code>strs</code> contains strings like <code>strs = ["flower","flow","flight"]</code>. In this case the common prefix will be <code>”fl”</code> which is included in every string in common. When there is no common prefix, an empty string will be returned.</p>
<h1 id="heading-approach"><strong>Approach</strong></h1>
<p>The only idea to solve the problem was to go with two <code>for</code> loops with <em>O(n²)</em>. The base characters to compare other characters in each string is the first string’s characters.</p>
<h1 id="heading-solution"><strong>Solution</strong></h1>
<p>There were cases when there is a string in <code>strs</code> but the string is empty like <code>””</code>, so I returned this when the string length is 1 and it’s empty.</p>
<p>Then try to pile characters in <code>s = ““</code> if a character in each string is common. Afterwards, it takes steps as follows.</p>
<ol>
<li><p>The loop occurs in <code>strs[0]</code>. and another loop goes in each string as <code>word</code>.</p>
</li>
<li><p>Only if the index <code>i</code> is less than the length of the word, check if the character is common. If not common, I returned <code>s</code> to finish the code.</p>
</li>
<li><p>If <code>i</code> is equal or greater than <code>len(word)</code>, return <code>s</code> as it is no use running the code any further.</p>
</li>
<li><p>Other than that, add the character to <code>s</code>.</p>
</li>
</ol>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">longestCommonPrefix</span>(<span class="hljs-params">self, strs: List[str]</span>) -&gt; str:</span>
        <span class="hljs-keyword">if</span> len(strs) == <span class="hljs-number">1</span>:
            <span class="hljs-keyword">return</span> strs[<span class="hljs-number">0</span>]

        s = <span class="hljs-string">""</span>

        <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(strs[<span class="hljs-number">0</span>])):
            <span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> strs:
                <span class="hljs-keyword">if</span> i &lt; len(word): <span class="hljs-comment"># only when index is within range</span>
                    <span class="hljs-keyword">if</span> word[i] != strs[<span class="hljs-number">0</span>][i]:
                        <span class="hljs-keyword">return</span> s
                <span class="hljs-keyword">elif</span> i &gt;= len(word):
                    <span class="hljs-keyword">return</span> s
            s += strs[<span class="hljs-number">0</span>][i]
        <span class="hljs-keyword">return</span> s
</code></pre>
<h1 id="heading-reflection"><strong>Reflection</strong></h1>
<p>I have looked up other solutions on LeetCode but it was plausible that everyone chose to solve the problem with <em>O(n²)</em> just like I did it. I believe there should be a better solution though.</p>
]]></content:encoded></item><item><title><![CDATA[[LeetCode] Top Interview 150 Problems Solving # 70 Climbing Stairs]]></title><description><![CDATA[Understanding the Problem
A number is given as n. Imagine that only 1 or 2 steps could be taken at once to reach the number n. For instance, if the number is 4, there are 5 ways to reach number 4.
n = 4

# ways to reach n
1, 1, 1, 1 # 1
2, 1, 1 # 2
1...]]></description><link>https://ramieeee.me/leetcode-top-interview-150-problems-solving-70-climbing-stairs</link><guid isPermaLink="true">https://ramieeee.me/leetcode-top-interview-150-problems-solving-70-climbing-stairs</guid><category><![CDATA[leetcode]]></category><category><![CDATA[patterns]]></category><category><![CDATA[Python]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Tue, 18 Mar 2025 11:21:08 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-understanding-the-problem"><strong>Understanding the Problem</strong></h1>
<p>A number is given as <code>n</code>. Imagine that only 1 or 2 steps could be taken at once to reach the number <code>n</code>. For instance, if the number is 4, there are 5 ways to reach number 4.</p>
<pre><code class="lang-python">n = <span class="hljs-number">4</span>

<span class="hljs-comment"># ways to reach n</span>
<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span> <span class="hljs-comment"># 1</span>
<span class="hljs-number">2</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span> <span class="hljs-comment"># 2</span>
<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">1</span> <span class="hljs-comment"># 3</span>
<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span> <span class="hljs-comment"># 4</span>
<span class="hljs-number">2</span>, <span class="hljs-number">2</span> <span class="hljs-comment"># 5</span>
</code></pre>
<p>So the return number should be 5.</p>
<h1 id="heading-approach"><strong>Approach</strong></h1>
<p>I was attempting to solve the problem with mathematical approach, but I could not find a pattern by dividing by 2 or other methods.</p>
<p>After some trials, I was manually enumerating the ways to reach the number and found out that there was a pattern, which each number was a output of the sum of the previous 2 numbers. With this hint, I used a loop to complete the task.</p>
<h1 id="heading-solution"><strong>Solution</strong></h1>
<p>The first 3 numbers were okay to return itself. So I returned <code>n</code> if the number is less or equal to 3. And I took the steps as follow.</p>
<ol>
<li><p>The number starts at least from 4, so I declared <code>nums = [1, 2, 3]</code> first, and <code>count = 3</code></p>
</li>
<li><p>In the <code>while</code> loop, it appends to <code>nums</code> the sum of the last two numbers by accessing with <code>nums[-1]</code> and <code>nums[-2]</code></p>
</li>
<li><p>Eventurally return <code>nums[-1]</code></p>
</li>
</ol>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">climbStairs</span>(<span class="hljs-params">self, n: int</span>) -&gt; int:</span>
        <span class="hljs-keyword">if</span> n &lt;= <span class="hljs-number">3</span>:
            <span class="hljs-keyword">return</span> n

        nums = [<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]
        count = <span class="hljs-number">3</span>

        <span class="hljs-keyword">while</span> count &lt; n:
            nums.append(nums[<span class="hljs-number">-1</span>] + nums[<span class="hljs-number">-2</span>])
            count += <span class="hljs-number">1</span>

        <span class="hljs-keyword">return</span> nums[<span class="hljs-number">-1</span>]
</code></pre>
<h1 id="heading-reflection"><strong>Reflection</strong></h1>
<p>It took some time to solve the problem until I figured out the pattern of the steps. This time I learned also that sometimes manually enumerating to find some patterns in problems like this will help finding the solution, yet it is not always the best practice though.</p>
]]></content:encoded></item><item><title><![CDATA[[LeetCode] Top Interview 150 Problems Solving # 290 Word Pattern]]></title><description><![CDATA[Understanding the Problem
There are two strings given. One is pattern, and another one is a string with spaces. If the pattern is ”abba” and the string is ”dog cat cat dog”, the first index ”a” takes ”dog”, the second index ”b” takes ”cat”, the third...]]></description><link>https://ramieeee.me/leetcode-top-interview-150-problems-solving-290-word-pattern</link><guid isPermaLink="true">https://ramieeee.me/leetcode-top-interview-150-problems-solving-290-word-pattern</guid><category><![CDATA[Python]]></category><category><![CDATA[dictionary]]></category><category><![CDATA[hashmap]]></category><category><![CDATA[leetcode]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Sat, 15 Mar 2025 09:03:40 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-understanding-the-problem"><strong>Understanding the Problem</strong></h1>
<p>There are two strings given. One is pattern, and another one is a string with spaces. If the pattern is <code>”abba”</code> and the string is <code>”dog cat cat dog”</code>, the first index <code>”a”</code> takes <code>”dog”</code>, the second index <code>”b”</code> takes <code>”cat”</code>, the third index <code>”b”</code> takes <code>”cat”</code> again, the fourth index <code>”a”</code> takes <code>”dog”</code>. The <code>”a”</code> all matches with <code>”dog”</code> and all <code>”b”</code> matches with <code>”cat”</code>, so it returns <code>True</code>.</p>
<p>See the example below.</p>
<pre><code class="lang-python"><span class="hljs-comment"># example 1</span>
Input: pattern = <span class="hljs-string">"abba"</span>, s = <span class="hljs-string">"dog cat cat dog"</span>
Output: <span class="hljs-literal">True</span>

<span class="hljs-comment"># example 2</span>
Input: pattern = <span class="hljs-string">"abba"</span>, s = <span class="hljs-string">"dog cat dog dog"</span>
Output: <span class="hljs-literal">False</span>
</code></pre>
<p>The second example returns <code>False</code> because <code>”b”</code> patterns have two different values <code>”cat”</code> and <code>”dog”</code>, so the patterns do not have relative values.</p>
<h1 id="heading-approach"><strong>Approach</strong></h1>
<p>It is a hashmap problem. I would use a hashmap and check if the keys values of each pattern match what is in the dictionary.</p>
<ol>
<li><p>Use dictionary</p>
</li>
<li><p>loop the pattern and check the dictionary</p>
</li>
</ol>
<h1 id="heading-solution"><strong>Solution</strong></h1>
<p>I had to declare the dictionary first and fill it up one by one to check the next key and value of the patterns. For the exception, if the length of the <code>pattern</code> and <code>s</code> are not equal, I returned <code>False</code>.</p>
<p>The key to solving the problem is like this.</p>
<ol>
<li><p><code>for</code> looping with <code>zip(pattern, s.split())</code> to check key and value in pair</p>
</li>
<li><p>Get all values and keys of the dictionary in <code>vals</code> and <code>keys</code></p>
</li>
<li><p>Check with conditions</p>
</li>
</ol>
<ul>
<li><p>if the pattern is not in dictionary keys, and the value also is not in the values of the dictionary, simply add the key and value</p>
</li>
<li><p>if the pattern is not in the keys of the dictionary, but the value is in the dictionary, the pattern does not match, so return <code>False</code>. For example, the dictionary is with key value as <code>d = {"a": "dog"}</code>, and the current pattern is <code>”b”</code> with value <code>”dog”</code>, the pattern does not match the current dictionary</p>
</li>
<li><p>if pattern is in the keys of the dictionary and the value is not equal to its value, it returns <code>False</code>. For example, the dictionary has key value as <code>d = {“a”": “dog”}</code>, and the current pattern is <code>“a”</code> with value <code>”cat”</code>, the pattern does not match</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">wordPattern</span>(<span class="hljs-params">self, pattern: str, s: str</span>) -&gt; bool:</span>
        <span class="hljs-keyword">if</span> len(pattern) != len(s.split()):
            <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>

        d = {}

        <span class="hljs-keyword">for</span> p, w <span class="hljs-keyword">in</span> zip(pattern, s.split()):
            vals = d.values()
            keys = d.keys()

            <span class="hljs-keyword">if</span> p <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> keys <span class="hljs-keyword">and</span> w <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> vals:
                d[p] = w
            <span class="hljs-keyword">elif</span> p <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> keys <span class="hljs-keyword">and</span> w <span class="hljs-keyword">in</span> vals:
                <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>
            <span class="hljs-keyword">elif</span> p <span class="hljs-keyword">in</span> keys <span class="hljs-keyword">and</span> d[p] != w:
                <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>
        <span class="hljs-keyword">return</span> <span class="hljs-literal">True</span>
</code></pre>
<h1 id="heading-reflection"><strong>Reflection</strong></h1>
<p>It had <em>O(n)</em> complexity and the solution is pretty much okay I guess. It is just a bit making me feel uncomfortable with many <code>elif</code> conditions but other people also chose this way to solve the problem.</p>
]]></content:encoded></item><item><title><![CDATA[[LeetCode] Top Interview 150 Problems Solving # 28 Find the Index of the First Occurrence in a String]]></title><description><![CDATA[Understanding the Problem
There are two strings given as haystack and needle. needle is the string that could be included in haystack or not included. If it is included, return the first index that needle occurs. If it is not included in haystack, re...]]></description><link>https://ramieeee.me/leetcode-top-interview-150-problems-solving-28-find-the-index-of-the-first-occurrence-in-a-string</link><guid isPermaLink="true">https://ramieeee.me/leetcode-top-interview-150-problems-solving-28-find-the-index-of-the-first-occurrence-in-a-string</guid><category><![CDATA[leetcode]]></category><category><![CDATA[string]]></category><category><![CDATA[Python]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Fri, 14 Mar 2025 15:50:09 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-understanding-the-problem"><strong>Understanding the Problem</strong></h1>
<p>There are two strings given as <code>haystack</code> and <code>needle</code>. <code>needle</code> is the string that could be included in <code>haystack</code> or not included. If it is included, return the first index that <code>needle</code> occurs. If it is not included in <code>haystack</code>, return <code>-1</code>.</p>
<p>For example, if <code>haystack = "sadbutsad"</code> and <code>needle = "sad"</code>, the string <code>”sad”</code> occurs at the index <code>0</code> and <code>6</code>, so return <code>0</code>.</p>
<h1 id="heading-approach"><strong>Approach</strong></h1>
<p>My approach is to check whether <code>needle</code> is in <code>haystack</code>, then find the first index.</p>
<h1 id="heading-solution"><strong>Solution</strong></h1>
<ol>
<li><p>Check if <code>needle</code> is in <code>haystack</code> by <code>if needle in haystack</code> clause</p>
</li>
<li><p>If it exists in <code>haystack</code>, then use <code>find()</code> method to get the first index of the <code>needle</code> and return it</p>
</li>
<li><p>If it does not exist in <code>haystack</code>, return <code>-1</code></p>
</li>
</ol>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">strStr</span>(<span class="hljs-params">self, haystack: str, needle: str</span>) -&gt; int:</span>
        <span class="hljs-keyword">return</span> haystack.find(needle) <span class="hljs-keyword">if</span> needle <span class="hljs-keyword">in</span> haystack <span class="hljs-keyword">else</span> <span class="hljs-number">-1</span>
</code></pre>
<h1 id="heading-reflection"><strong>Reflection</strong></h1>
<p>I was wondering if I should use two pointers to solve this problem, but no need for that, as there are internal functions and methods to use.</p>
]]></content:encoded></item><item><title><![CDATA[[LeetCode] Top Interview 150 Problems Solving # 58 Length of Last Word]]></title><description><![CDATA[Understanding the Problem
A string is given. It has spaces in between. The task is to return the last string’s length. If the string is s = “Hello world”, the last string will be ”world” and its length will be 5, so return 5.
Approach
My approach wil...]]></description><link>https://ramieeee.me/leetcode-top-interview-150-problems-solving-58-length-of-last-word</link><guid isPermaLink="true">https://ramieeee.me/leetcode-top-interview-150-problems-solving-58-length-of-last-word</guid><category><![CDATA[Python]]></category><category><![CDATA[leetcode]]></category><category><![CDATA[string]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Thu, 13 Mar 2025 05:07:18 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-understanding-the-problem"><strong>Understanding the Problem</strong></h1>
<p>A string is given. It has spaces in between. The task is to return the last string’s length. If the string is <code>s = “Hello world”</code>, the last string will be <code>”world”</code> and its length will be 5, so return 5.</p>
<h1 id="heading-approach"><strong>Approach</strong></h1>
<p>My approach will be to <code>split()</code>, get the last index and return length of it.</p>
<h1 id="heading-solution"><strong>Solution</strong></h1>
<p>It was a simple problem. It would take a few lines without any help of the internal functions but it took one line with it.</p>
<p>First, split the string by space, then get the last index with <code>[-1]</code>, then get the length of the last index with <code>len()</code>.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">lengthOfLastWord</span>(<span class="hljs-params">self, s: str</span>) -&gt; int:</span>
        <span class="hljs-keyword">return</span> len(s.split()[<span class="hljs-number">-1</span>])
</code></pre>
<h1 id="heading-reflection"><strong>Reflection</strong></h1>
<p>Nothing much to say about this. I was happy to see that this kind of a problem could show in interviews.</p>
]]></content:encoded></item><item><title><![CDATA[[LeetCode] Top Interview 150 Problems Solving # 121 Best Time to Buy and Sell Stock]]></title><description><![CDATA[Understanding the Problem
It is to find the best time to sell the stock with the max profit. A given list indicates the stock prices. Each element is a stock price in time series. For instance, if prices = [7,1,5,3,6,4], the best time to buy the stoc...]]></description><link>https://ramieeee.me/leetcode-top-interview-150-problems-solving-121-best-time-to-buy-and-sell-stock</link><guid isPermaLink="true">https://ramieeee.me/leetcode-top-interview-150-problems-solving-121-best-time-to-buy-and-sell-stock</guid><category><![CDATA[Python]]></category><category><![CDATA[leetcode]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Wed, 12 Mar 2025 14:38:48 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-understanding-the-problem"><strong>Understanding the Problem</strong></h1>
<p>It is to find the best time to sell the stock with the max profit. A given list indicates the stock prices. Each element is a stock price in time series. For instance, if <code>prices = [7,1,5,3,6,4]</code>, the best time to buy the stock is when the price is <code>1</code> and the best time to sell it is when the price is <code>6</code>, so the profit will be <code>5</code>, which is a returned number.</p>
<h1 id="heading-approach"><strong>Approach</strong></h1>
<p>My idea was to loop it twice with <code>for</code> to find the best profit. But I faced the <em>time limit</em> error and could not pass with <em>O(n²)</em>. I had to make it to <em>O(n)</em>.</p>
<h1 id="heading-solution"><strong>Solution</strong></h1>
<p>Well, my goal was to find the min number and the max difference to find the best profit. So first initialize <code>min_num</code> with <code>float(“inf”)</code>. The code will start the loop one by one of the prices.</p>
<p>If the element <code>p</code> is lower than <code>min_num</code>, set <code>p</code> to <code>min_num</code> for subtraction comparison. if <code>p - min_num</code> is greater than <code>max_diff</code>, which means the profit is greater, set <code>max_diff</code> with <code>p - min_num</code>.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">maxProfit</span>(<span class="hljs-params">self, prices: List[int]</span>) -&gt; int:</span>
        min_num = float(<span class="hljs-string">"inf"</span>)
        max_diff = <span class="hljs-number">0</span>

        <span class="hljs-keyword">for</span> p <span class="hljs-keyword">in</span> prices:
            <span class="hljs-keyword">if</span> p &lt; min_num:
                min_num = p
            <span class="hljs-keyword">elif</span> p - min_num &gt; max_diff:
                max_diff = p - min_num
        <span class="hljs-keyword">return</span> max_diff
</code></pre>
<h1 id="heading-reflection"><strong>Reflection</strong></h1>
<p>The code was simple but it took some time for me to solve the problem. I was busy visiting my parents and I was looking for other jobs to move to, which means I could not focus on the problem solving algorithm. Anyways, time efficiency this time again.</p>
]]></content:encoded></item><item><title><![CDATA[[LeetCode] Top Interview 150 Problems Solving # 169 Majority Element]]></title><description><![CDATA[Understanding the Problem
It is a simple task to return the number which appears the most in the list nums. If the input is nums = [2,2,1,1,1,2,2], the number that appears the most will be 2, thus return 2.
Approach
I would first acquire the unique n...]]></description><link>https://ramieeee.me/leetcode-top-interview-150-problems-solving-169-majority-element</link><guid isPermaLink="true">https://ramieeee.me/leetcode-top-interview-150-problems-solving-169-majority-element</guid><category><![CDATA[Python]]></category><category><![CDATA[algorithms]]></category><category><![CDATA[leetcode]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Fri, 07 Mar 2025 08:33:10 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-understanding-the-problem"><strong>Understanding the Problem</strong></h1>
<p>It is a simple task to return the number which appears the most in the list <code>nums</code>. If the input is <code>nums = [2,2,1,1,1,2,2]</code>, the number that appears the most will be <code>2</code>, thus return <code>2</code>.</p>
<h1 id="heading-approach"><strong>Approach</strong></h1>
<p>I would first acquire the unique numbers from <code>nums</code> and count how many times they appear each in the list <code>nums</code>. Then return the number that appears in the list the most.</p>
<h1 id="heading-solution"><strong>Solution</strong></h1>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">majorityElement</span>(<span class="hljs-params">self, nums: List[int]</span>) -&gt; int:</span>
        unique = list(set(nums))
        max_num = [nums.count(i) <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> unique]
        idx = max_num.index(max(max_num))

        value = unique[idx]
        <span class="hljs-keyword">return</span> value
</code></pre>
<p>As simple as that, I would get the unique number with <code>set()</code> and change it to <code>list()</code> in case the index can be positioned randomly. Then count how many times the unique numbers appear. In the end get the majority number’s count by getting the index, then get the number from the <code>unique</code> with the <code>idx</code>, which was the result of <code>max_num.index(max(max_num))</code>.</p>
<h1 id="heading-reflection"><strong>Reflection</strong></h1>
<p>I thought there would be a better solution to this but other people also had it solved pretty much the same way. This did not take too long, as it was considered <em>easy</em> by LeetCode.</p>
]]></content:encoded></item><item><title><![CDATA[[LeetCode] Top Interview 150 Problems Solving # 20 Valid Parentheses]]></title><description><![CDATA[Understanding the Problem
A given string with brackets have three types of brackets; brackets (), braces {} and squared brackets []. It is to return True if the brackets are paired to open and close in a correct way, like (){}[] or {[()]}. Other stri...]]></description><link>https://ramieeee.me/leetcode-top-interview-150-problems-solving-20-valid-parentheses</link><guid isPermaLink="true">https://ramieeee.me/leetcode-top-interview-150-problems-solving-20-valid-parentheses</guid><category><![CDATA[leetcode]]></category><category><![CDATA[Python]]></category><category><![CDATA[algorithms]]></category><dc:creator><![CDATA[Ramhee Yeon]]></dc:creator><pubDate>Thu, 06 Mar 2025 08:17:50 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-understanding-the-problem"><strong>Understanding the Problem</strong></h1>
<p>A given string with brackets have three types of brackets; brackets <code>()</code>, braces <code>{}</code> and squared brackets <code>[]</code>. It is to return <code>True</code> if the brackets are paired to open and close in a correct way, like <code>(){}[]</code> or <code>{[()]}</code>. Other strings like <code>[(])</code> or <code>(()</code> will be considered as <em>not paired</em> so <code>False</code> will be returned.</p>
<h1 id="heading-approach"><strong>Approach</strong></h1>
<p>I considered using a dictionary with opening of the brackets as keys and closing of the brackets as values, so I would know the opening and closing will pair. Then use a stack to compare the values.</p>
<h1 id="heading-solution"><strong>Solution</strong></h1>
<p>It was not that easy until I figured out it was all about <em>stack</em> problem. I attempted to solve it with slicing the list from the opening of the bracket to the closing of the bracket then check whether the values at the certain indexes will match, but this one would not solve the test cases completely as it will disregard the brackets in the middle of the list, of the sliced list.</p>
<p>But using stack was the idea, then I wrote code in these steps.</p>
<ol>
<li><p>If the string has got the odd length or the closing brackets come first, return <code>False</code></p>
</li>
<li><p>Go through <code>for</code> loop, and stack the openings of brackets</p>
</li>
<li><p>When the character <code>s[i]</code> is not a opening of a bracket, check it with the last stack if the key and value match one another. If not, return <code>False</code></p>
</li>
<li><p>If the closing bracket <code>s[i]</code> match the opening value from the stack, pop it from the stack</p>
</li>
</ol>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Solution</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">isValid</span>(<span class="hljs-params">self, s: str</span>) -&gt; bool:</span>
        <span class="hljs-keyword">if</span> len(s) % <span class="hljs-number">2</span> != <span class="hljs-number">0</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>

        d = {
            <span class="hljs-string">"("</span>: <span class="hljs-string">")"</span>,
            <span class="hljs-string">"{"</span>: <span class="hljs-string">"}"</span>,
            <span class="hljs-string">"["</span>: <span class="hljs-string">"]"</span>
        }

        stack = []
        <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(s)):
            <span class="hljs-keyword">if</span> len(stack) == <span class="hljs-number">0</span> <span class="hljs-keyword">and</span> s[i] <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> d.keys():
                <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>

            <span class="hljs-keyword">if</span> s[i] <span class="hljs-keyword">in</span> d.keys():
                stack.append(s[i])
            <span class="hljs-keyword">elif</span> s[i] != d[stack[<span class="hljs-number">-1</span>]]:
                <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>
            <span class="hljs-keyword">else</span>:
                stack.pop()

        <span class="hljs-keyword">if</span> len(stack) != <span class="hljs-number">0</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>
        <span class="hljs-keyword">return</span> <span class="hljs-literal">True</span>
</code></pre>
<h1 id="heading-reflection"><strong>Reflection</strong></h1>
<p>It is always a matter of finding the right algorithm to solve a problem. I have got the tendency to spend some time by just coding without thinking much of the intension of the problem, but it is not always working good. I learned again that setting a right algorithm first is necessary though.</p>
]]></content:encoded></item></channel></rss>