Abay's Blog – Machine Learning

Unseen Opportunities: Upwork + GPT

If you are just interested in the data analysis, skip to the end of the article.

Summer, 2022. I quit my lucrative AI job not out of dissatisfaction, but out of exhilaration. My past couple of years were brimming with happiness, which sparked an awakening within me: if I did not leap into the unknown, doing so might eventually become difficult or even impossible. Within two hours of this epiphany, I sent my company 2-week notice.

Why Does Batch Normalization Work?

Introduction

Batch Normalization is a simple yet extremely effective technique that makes learning with neural networks faster and more stable. Despite the common adoption, theoretical justification of BatchNorm has been vague and shaky. The belief propagating through the ML community is that BatchNorm improves optimization by reducing internal covariate shift (ICS). As we shall see, ICS has little to no effect on optimization. This blog post looks at the explanations of why BatchNorm works, mainly agreeing with the conclusions from: How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) [1]. This works joins the effort of making reproducibility and open-source a commonplace in ML by reproducing the results from [1] live in your browser (thanks to TensorFlow.js ). To see the results you will have to train the models from scratch, made as easy as clicking a button. Initialization of parameters is random, therefore you will see completely different results every time you train the models. Source code to the models presented here can be found in this GitHub repo.

How to use Google’s pre-trained Language Model

Background

Having a good pre-trained language model can be significant time and money saver (better results for less compute and time) for NLP projects. Empirical studies have shown that unsupervised pre-training greatly improves generalization and training in many machine learning tasks. For example according to this paper from Montreal and Google:

The results suggest that unsupervised pretraining guides the learning towards basins of attraction of minima that support better generalization from the training data set;

Thankfully, almighty Google scientists made some of their models open and available for everyone to use. Here, we will utilize Google's lm_1b pre-trained TensorFlow language model. Vocabulary size of the model is 793471, and it was trained of 32 GPUs for five days. If you want to learn the details, please refer to this paper. The entire TensorFlow graph is defined in a protobuf file. That model definition also specifies which compute devices are to be used, and it set to use primary CPU device. That's fine, most of us will not be able to fit the large model parameters into conventional desktop GPU anyway. The original lm_1b repository describes steps somewhat awkwardly. You need to do some manual work, and run Bazel commands with arguments to use the model. If you want get embeddings, you need to again run Bazel commands with your text in the parameters and it will save results into a file. On top of that, their inference and evaluation code is written for Python 2. The code and instructions provided here allow you to fetch embeddings in a run-time of a Python 3 code in more flexible manner. Continue reading

Жасанды Интеллект Абай Құнанбайұлының Стилінде Жазуға Үйренеді

Жасанды Интеллект - жиырма бірінші ғасырдың нео-Индустриялық Революциясы. Пилотсыз көліктер, адам көңілін сезетін роботтар, рак ауруларын анықтайтын программалар, он жылдай бұрын түске де кірмейтін арман, бүгін бүкіл әлемге әйгілі технология. Жасанды Интеллект саласының арқасында адамзат жаңа дәуірге қадам басып жатыр. Бірақ ғалымдардың зерттеулері қандай табысты болса да, алда әлі талай жумыс бар. Жасанды Интеллекттің бүгінгі шешіліп жатқан мәселелерінің бірі - Табиғи Тіл Өңдеу (Natural Language Processing): компьютердің адам тілін түсініп үйренуі. Біз жас кезімізден ана тілді сөйлеп қолдануға үйренеміз, сондықтан тіл түсінуде еш қиындық көрмейміз. Компьютерлерге болса, тіл түсіну өте қиын. Неге десеңіз, адамзаттың жаратқан ғажайып тіл моделі алгоритімдік есептеу тұрғысынан өте күрделі. Бұл жобада, мен Жасанды Нейрондық Желіні (Artificial Neural Network/Искусственная нейронная сеть) құрастырып, Жасанды Интеллектке Абай атамыздың өлеңдері мен қара сөздерін үйретемін. Бұл жұмыстың мақсаты Жасанды Интеллекттің Қазақ тіл моделін үйренуі. Тіл моделімен қоса, ол Абай Құнанбайұлының жазу стилін сіңіреді. Бұл - Абайдың кішкене бөлшегі қайтадан өмірге келуі десеңіз де болады. Continue reading