Saturday, May 27, 2023

How to Run a Large Language Model on Your Raspberry Pi

How to Run a Large Language Model on Your Raspberry Pi

makeuseof.com

How to Run a Large Language Model on Your Raspberry Pi

David Rutland

Large language models, known generally (and inaccurately) as AIs, have been threatening to upend the publishing, art, and legal world for months. One downside is that using LLMs such as ChatGPT means creating an account and having someone else's computer do the work. But you can run a trained LLM on your Raspberry Pi to write poetry, answer questions, and more.

What Is a Large Language Model?

Large language models use machine learning algorithms to find relationships and patterns between words and phrases. Trained on vast quantities of data, they are able to predict what words are statistically likely to come next when given a prompt.

If you were to ask thousands of people how they were feeling today, the responses would be along the lines of, "I'm fine", "Could be worse", "OK, but my knees are playing up". The conversation would then turn in a different directon. Perhaps the person would ask about your own health, or follow up with "Sorry, I've got to run. I'm late for work".

Given this data and the initial prompt, a large language model should be able to come up with a convincing and original reply of its own, based on the likelihood of a certain word coming next in a sequence, combined with a preset degree of randomness, repetition penalties, and other parameters.

The large language models in use today aren't trained on a vox pop of a few thousand people. Instead, they're given an unimaginable amount of data, scraped from publicly available collections, social media platforms, web pages, archives, and occasional custom datasets.

LLMs are trained by human researchers who will reinforce certain patterns and feed them back to the algorithm. When you ask a large language model "what is the best kind of dog?", it'll be able to spin an answer telling you that a Jack Russell terrier is the best kind of dog, and give you reasons why.

But regardless how intelligent or convincingly and humanly dumb the answer, neither the model nor the machine it runs on has a mind, and they are incapable of understanding either the question or the words which make up the response. It's just math and a lot of data.

Why Run a Large Language Model on Raspberry Pi?

Large language models are everywhere, and are being adopted by large search companies to assist in answering queries.

While it's tempting to throw a natural language question at a corporate black box, sometimes you want to search for inspiration or ask a question without feeding yet more data into the maw of surveillance capitalism.

As an experimental board for tinkerers, the Raspberry Pi single-board computer is philosophically, if not physically, suited to the endeavor.

In February 2023, Meta (the company formerly known as Facebook) announced LLaMA, a new LLM boasting language models of between 7 billion and 65 billion parameters. LLaMA was trained using publicly available datasets,

The LLaMA code is open source, meaning that anyone can use and adapt it, and the 'weights' or parameters were posted as torrents and magnet links in a thread on the project's GitHub page.

In March 2023, developer Georgi Gerganov released llama.cpp, which can run on a huge range of hardware, including Raspberry Pi. The code runs locally, and no data is sent to Meta.

Install llama.cpp on Raspberry Pi

There are no published hardware guidelines for llama.cpp, but it is extremely processor, RAM, and storage hungry. Make sure that you're running it on a Raspberry Pi 4B or 400 with as much memory, virtual memory, and SSD space available as possible. An SD card isn't going to cut it, and a case with decent cooling is a must.

We're going to be using the 7 billion parameter model, so visit this LLamA GitHub thread, and download the 7B torrent using a client such as qBittorrent or Aria.

Clone the llama.cpp repository and then use the cd command to move into the new directory:

 git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

If you don't have a compiler installed, install one now with:

 sudo apt-get install g++ 

Now compile the project with this command:

 make 

There's a chance that llama.cpp will fail to compile, and you'll see a bunch of error messages relating to "vdotq_s32". If this happens, you need to revert a commit. First, set your local git user:

 git config user.name "david" 

Now you can revert a previous commit:

 git revert 84d9015 

A git commit message will open in the nano text editor. Press Ctrl + O to save, then Ctrl + X to exit nano. llama.cpp should now compile without errors when you enter:

 make 

You'll need to create a directory for the weighted models you intend to use:

 mkdir models 

Now transfer the weighted models from the LLaMa directory:

 mv ~/Downloads/LLaMA/* ~/llama.cpp/models/ 

Make sure you have Python 3 installed on your Pi, and install the llama.cpp dependencies:

 python3 -m pip install torch numpy sentencepiece 

The NumPy version may cause issues. Upgrade it:

 pip install numpy --upgrade 

Now convert the 7B model to ggml FP16 format:

 python3 convert-pth-to-ggml.py models/7B/ 1 

The previous step is extremely memory intensive and, by our reckoning, uses at least 16GB RAM. It's also super slow and prone to failure.

You will get better results by following these instructions in parallel on a desktop PC, then copying the file /models/7B/ggml-model-q4_0.bin to the same location on your Raspberry Pi.

Quantize the model to 4 bits:

 ./quantize.sh 7B 

That's it. LLaMA LLM is now installed on your Raspberry Pi, and ready to use!

Using llama.cpp on Raspberry Pi

To get started with llama.cpp, make sure you're in the project directory and enter the following command:
 ./main -m ./models/7B/ggml-model-q4_0.bin -t 4 -n 128 -p "What is the Linux Kernel?" 

The -m option is to direct llama.cpp to the model you want it to use; -t indicates the number of threads you want it to use; -n is the number of tokens to predict; -p is your prompt. You can customize any and all of these variables.

Settle in for a long wait, because while the Raspberry Pi is excellent at what it does, it wasn't designed for this kind of CPU activity. In our example prompt, llama broke the text down into eight individual tokens, before giving the following response:

Linux Kernel, in a nutshell, has always been referred to as an operating system. However it’s not really considered one since for all practical purposes most of its functions are exposed through APIs and utilities which can be used by any other OS that wants (or needs) access those API(s). As far Linux Kernel is concerned itself there isn't a whole lot to do with the actual kernel. It’ll just start, receive configuration parameters from your bootloader/initrd of choice then get it going on some hardware and you go about whatever business that entails at this point

While this was a fairly coherent answer, a second attempt resulted in something resembling roughly chopped word salad:

Most of you have probably heard about it but not really know what they are talking about. We will be discussing this in details because understanding them fully helps us to use our computers more efficiently and also make better decisions when buying new hardware or software for your PCs at home, offices etc.. The Linux Kernel is the backbone of most operating systems that runs on a computer system such as android which is an open source Operating System based in part from this kernel. But what exactly do they mean by saying linux kernal?

The Linus Torvalds has written down his thoughts (Knowledge)

It's worth mentioning at this point that each of these responses took close to two hours to produce, and had all four CPU cores working at 100% throughout. Work on the project is moving fast, however, and you can expect upcoming ARM optimizations to bring the time down.

You can also simulate entire conversations with your LLM instead of feeding it prompts one at a time. Adding the -i option will start llama in interactive mode, while --interactive-start will ask you for input at the start. For a full list of available options, run:

 main -h 

Be aware that LlaMA doesn't have restrictive rules. It will, on occasion, be sexist, racist, homophobic, and very wrong.

A Large Language Model Is No Substitute for Real Knowledge

Running Meta's LLaMA on Raspberry Pi is insanely cool, and you may be tempted to turn to your virtual guru for technical questions, life advice, friendship, or as a real source of knowledge. Don't be fooled. Large language models know nothing, feel nothing, and understand nothing. If you need help with something, it's better to talk to a human being or to read something written by a human being.

If you're short of time, you could speed-read it in your Linux terminal!

makeuseof.com

Bard vs. ChatGPT vs. Offline Alpaca: Which Is the Best LLM?

David Rutland

Large language models (LLMs) come in all shapes and sizes, and will assist you in any way you see fit. But which is best? We put the dominant AIs from Alphabet, OpenAI, and Meta to the test.

What You Need to Know About AI Chatbots

Artificial general intelligence has been a goal of computer scientists for decades, and AI has served as a mainstay for science fiction writers and moviemakers for even longer.

AGI exhibits intelligence similar to human cognitive capabilities, and the Turing Test—a test of a machine's ability to exhibit intelligent behavior indistinguishable from that of a human—remained almost unchallenged in the seven decades since it was first laid out.

The recent convergence of extremely large-scale computing, vast quantities of money, and the astounding volume of information freely available on the open internet allowed tech giants to train models which can predict the next word section—or token—in a sequence of tokens.

At the time of writing, both Google's Bard and OpenAI's ChatGPT are available for you to use and test through their web interfaces.

Meta's language model, LLaMa, is not available on the web, but you can easily download and run LLaMa on your own hardware and use it through a command line or run Dalai on your own machine—one of several apps with a user-friendly interface.

For the purposes of the test, we'll be running Stanford University's Alpaca 7B model—an adaptation of LLaMa—and pitching it against Bard and ChatGPT.

The following comparisons and tests are not meant to be exhaustive but rather give you an indication of key points and capabilities.

Which Is the Easiest Large Language Model to Use?

Both Bard and ChatGPT require an account to use the service. Both Google and OpenAI accounts are easy and free to create, and you can immediately start asking questions.

However, to run LLaMa locally, you will need to have some specialized knowledge or the ability to follow a tutorial. You'll also need a significant amount of storage space.

Which Is the Most Private Large Language Model?

Both Bard and ChatGPT have extensive privacy policies, and Google repeatedly stresses in its documents that you should "not include information that can be used to identify you or others in your Bard conversations."

By default, Google collects your conversations and your general location based on your IP address, your feedback, and usage information. This information is stored in your Google account for up to 18 months. Although you can pause saving your Bard activity, you should be aware that "to help with quality and improve our products, human reviewers read, annotate, and process your Bard conversations."

Use of Bard is also subject to the standard Google Privacy Policy.

OpenAI's Privacy policy is broadly similar and collects IP address and usage data. In contrast with Google's time-limited retention, OpenAI will "retain your Personal Information for only as long as we need in order to provide our Service to you, or for other legitimate business purposes such as resolving disputes, safety and security reasons, or complying with our legal obligations."

In contrast, a local model on your own machine doesn't require an account or share user data with anyone.

Which LLM Has the Best General Knowledge?

In order to test which LLM has the best general knowledge, we asked three questions.

The first question, "Which national flag has five sides?" was only correctly answered by Bard, which identified the national flag of Nepal as having five sides.

ChatGPT confidently claimed that "There is no national flag that has five sides. National flags are typically rectangular or square in shape, characterized by their distinct colors, patterns, and symbols".

Our local model came close, stating that "The Indian National Flag has five sides and was designed in 1916 to represent India's independence movement." While this flag did exist and did have five sides, it was the flag of the Indian Home Rule Movement—not a national flag.

None of our models could respond that the correct term for a pea-shaped object is "pisiform," with ChatGPT going so far as to suggest that peas have a "three-dimensional geometric shape that is perfectly round and symmetrical."

All three chatbots correctly identified Franco Malerba as an Italian astronaut and member of the European Parliament, with Bard giving an answer worded identically to a section of Malerba's Wikipedia entry.

Which LLM Is Good for Technical Instructions?

When you have technical problems, you might be tempted to turn to a chatbot for help. While technology marches on, some things remain the same. The BS 1363 electrical plug has been in use in Britain, Ireland, and many other countries since 1947. We asked the language models how to correctly wire it up.

Cables attaching to the plug have a live wire (brown), an earth wire (yellow/green), and a neutral wire (blue). These must be attached to the correct terminals within the plug housing.

Our Dalai implementation correctly identified the plug as "English-style," then veered off-course and instead gave instructions for the older round-pin BS 546 plug together with older wiring colors.

ChatGPT was slightly more helpful. It correctly labeled the wiring colors and gave a materials list and a set of eight instructions. ChatGPT also suggested putting the brown wire into the terminal labeled "L," the blue wire into the "N" terminal, and the yellow wire into "E." This would be correct if BS1363 terminals were labeled, but they aren't.

Bard identified the correct colors for the wires and instructed us to connect them to Live, Neutral, and Earth terminals. It gave no instructions on how to identify these.

In our opinion. none of the chatbots gave instructions sufficient to help someone correctly wire a BS 1363 electrical plug. A concise and correct response would be, "Blue on the left, brown on the right."

Which LLM Is Good for Writing Code?

Python is a useful programming language that runs on most modern platforms. We instructed our models to use Python and "Build a basic calculator program that can perform arithmetic operations like addition, subtraction, multiplication, and division. It should take user input and display the result." This is one of the best programming projects for beginners.

While both Bard and ChatGPT instantly returned usable and thoroughly commented code, which we were able to test and verify, none of the code from our local model would run.

Which LLM Tells the Best Jokes?

Humor is one of the fundamentals of being human and surely one of the best ways of telling man and machine apart. To each of our models, we gave the simple prompt: "Create an original and funny joke."

Fortunately for comedians everywhere and the human race at large, none of the models were capable of generating an original joke.

Bard rolled out the classic, "Why did the scarecrow win an award? He was outstanding in his field".

Both our local implementation and ChatGPT offered the groan-worthy, "Why don't scientists trust atoms? Because they make up everything!"

A derivative but original joke would be, "How are Large Language Models like atoms? They both make things up!"

You read it here first, folks.

No Chatbot Is Perfect

We found that while all three large language models have their advantages and disadvantages, none of them can replace the real expertise of a human being with specialized knowledge.

While both Bard and ChatGPT gave better responses to our coding question and are very easy to use, running a large language model locally means you don't need to be concerned about privacy or censorship.

If you'd like to create great AI art without worrying that somebody's looking over your shoulder, it's easy to run an art AI model on your local machine, too.

 

resume entries

 

Created and taught class "Underwater Acoustic Sensor Systems" for Raytheon (Hughes Aircraft)

Created and taught class "Radar Systems" for University of California, San Diego Extension

performed detailed system modeling and performance prediction for various sonar systems, including SURTASS and Mk48 Mod 5 ACAP

performed detailed system modeling and performance prediction for various synthetic aperture radar systems,  including AN/APY-8 for unmanned aerial vehicles (UAV)

performed study of various AI approaches for ESM and data fusion, including CNN, DL, Demster Shafer, Bayes Probabalistic.

performed detailed system modeling and performance prediction for Taiwan Link 16 hybrid integrated command and control system using OPNET, produced system specifications, and managed requirements in DOORS data base during program development.