- 6 Tools to Search Source Code for Developers in Linux
- 1) ACK
- 2) Ag — the silver searcher
- 3) ripgrep (rg)
- 4) Sift
- 5) pt — The platinum searcher
- 6) Git grep
- Conclusion
- Linux source code search
- Linux/Reading the Linux Kernel Sources
- Contents
- Reading the Linux Kernel Sources [ edit | edit source ]
- Introduction to the Linux Kernel Source [ edit | edit source ]
- Where the CPU Starts Executing [ edit | edit source ]
- Where «User Space» is Started [ edit | edit source ]
- End of the Road [ edit | edit source ]
6 Tools to Search Source Code for Developers in Linux
While searching for text in files inside a directory structure from command prompt/shell, there are many tools available in Linux. The one of the tool which is oldest and widely used is grep that stands for global regular expression print. There are some issues with grep like it is not as fast while searching source code files. There is another text/pattern searching tool available specifically for searching text inside source code is ack. A good searching tool is a lifeline for developers who relies on shell prompt, editor like vi or emacs or an IDE for writing codes.
In this article, I will cover the basics of few search tools that will make life easier while searching text inside files.
The search tools that we will explore in this tutorial are:
1) ACK
Ack is a code-searching tool, similar to grep but optimized for programmers searching large trees of source code. It runs in pure Perl, is highly portable, and runs on any platform that runs Perl. By default, ACK search directories recursively and ignores common version control directories by default like .git, .gitignore, .svn It also ignores binary files, image/music/video files, gzip/zip/tar archive files. The output of ack have better highlighting of matches and format the output clearly.
Install ACK in Ubuntu
Install ACK in CentOS
In Ubuntu, there is already a package available by the name ‘ack’ which has nothing to do with searching. So the packagers had renamed this searching tool as ack-grep. Once you have installed it using apt-get, you can change/shorten its name to ack using following command.
To find all the options that you can use while executing ack command, use the following man command
2) Ag — the silver searcher
Ag is also a code searching tool like ack but it is significantly faster than ack. As compared to ack, it can search through compressed files and have better editor (vim) integration. Like ack, ag also ignores file patterns from .gitignore and .hgignore. Basic usage of Ag is simple: cd to the directory you want to search and run ag blah to find instances of «blah». It had been found that silver search is 34 times faster than ack while searching same text in source files.
Install Ag in Ubuntu
Install Ag in CentOS
To find all the options that you can use while executing ag command, use the following man command
3) ripgrep (rg)
Ripgrep is a line oriented search tool that combines the usability of The Silver Searcher (similar to ack) with the raw speed of GNU grep. ripgrep works by recursively searching your current directory for a regex pattern. ripgrep has first class support on Windows, Mac and Linux, with binary downloads available for every release.
Ripgrep is faster than both the Silver Searcher and GNU grep. Like The Silver Searcher, ripgrep also defaults to recursive directory search and won’t search files ignored by your .gitignore files. It also ignores hidden and binary files by default. Ripgrep can search specific types of files. For example, ‘rg -tpy foo’ limits your search to Python files and ‘rg -Tjs foo’ excludes JavaScript files from your search. Unlike GNU grep, ripgrep stays fast while supporting Unicode.
Installation of ripgrep binary in Ubuntu/CentOS
The usage of rg is described the github page of ripgrep
4) Sift
Sift is another searching tool that is developed keeping in mind both speed and flexibility. Sift uses perl compatible regular expression format with basic options known from grep but with usable defaults. It can select or exclude targets based on file name, directory name, path and type. Like earlier search tools sift understands .gitignore files and can be configured to only show results in relevant files. Sift has multiline support and can replace output to reformat it to your needs without relying on awk/sed. Sift can also search through gzip files and can handle search inside big files of size>50GB. Another cool feature of sift is you may specify various conditions while searching text like-
→preceded by A
→followed by B within X lines
→if the file also contains a line with C
→if the file contains D in the first Y lines
→any combination of the available conditions
sift comes as a single executable with no dependencies and is available for all major platforms. So you can install it in any platform easily.
Download sift from the download section of official sift site, unzip it and move it to any location pointed by PATH environment variable.
For sift usage check the documentation of sift-tool.org
5) pt — The platinum searcher
Another source code search utility similar to ack and ag is Platinum Searcher (pt), that is a written in Go programming language. It is claimed to be 3 to 5 times faster than ack. Pt is safer as it is written in memory safe language and uses Go’s standard regexp package, enabling it to avoid exponential time matching. Platinum Searcher can search not only in files encoded with UTF-8, but also EUC-JP and Shift_JIS, making it very useful for Japanese programmers.
Installing and using pt
The Platinum Searcher binaries are available for Windows, Mac OS X, Linux (including ARM) from its Github releases page. Download the binary and move it to a location pointed by $PATH and start searching.
To search for a pattern in the current directory and all of its sub directories, simply type:
Usage:
pt [OPTIONS] PATTERN [PATH]
6) Git grep
Git grep search for a regular expression in a Git repository. In a way, it’s just a combination of find / grep combo, but very concise and fast. Git grep is a great tool for finding all uses and references to a symbol in a git repository. There is no separate installation for git grep as it installed alongside, when you install git.
For usage of git grep, check git-grep manual page
Conclusion
There are few others search utilities available like zgrep, agrep, xmlgrep, pdfgrep etc. Among all the search tools that we have discussed above ripgrep is faster and is cross platform whereas silver searcher (ag) is better than Ack. Grep is written in C but does not ignore files while searching while Ack is written in perl and is very good at ignoring files.
Источник
Linux source code search
The Silver Searcher
A code searching tool similar to ack , with a focus on speed.
Do you know C? Want to improve ag? I invite you to pair with me.
What’s so great about Ag?
- It is an order of magnitude faster than ack .
- It ignores file patterns from your .gitignore and .hgignore .
- If there are files in your source repo you don’t want to search, just add their patterns to a .ignore file. (*cough* *.min.js *cough*)
- The command name is 33% shorter than ack , and all keys are on the home row!
Ag is quite stable now. Most changes are new features, minor bug fixes, or performance improvements. It’s much faster than Ack in my benchmarks:
Ack and Ag found the same results, but Ag was 34x faster (3.2 seconds vs 110 seconds). My
/code directory is about 8GB. Thanks to git/hg/ignore, Ag only searched 700MB of that.
How is it so fast?
- Ag uses Pthreads to take advantage of multiple CPU cores and search files in parallel.
- Files are mmap() ed instead of read into a buffer.
- Literal string searching uses Boyer-Moore strstr.
- Regex searching uses PCRE’s JIT compiler (if Ag is built with PCRE >=8.21).
- Ag calls pcre_study() before executing the same regex on every file.
- Instead of calling fnmatch() on every pattern in your ignore files, non-regex patterns are loaded into arrays and binary searched.
I’ve written several blog posts showing how I’ve improved performance. These include how I added pthreads, wrote my own scandir() , benchmarked every revision to find performance regressions, and profiled with gprof and Valgrind.
Ubuntu >= 13.10 (Saucy) or Debian >= 8 (Jessie)
Fedora 21 and lower
Unofficial daily builds are available.
- This installs a release of ag.exe optimized for Windows.
- winget is intended to become the default package manager client for Windows.
As of June 2020, it’s still in beta, and can be installed using instructions there. - The setup script in the Ag’s winget package installs ag.exe in the first directory that matches one of these criteria:
- Over a previous instance of ag.exe from the same origin found in the PATH
- In the directory defined in environment variable bindir_%PROCESSOR_ARCHITECTURE%
- In the directory defined in environment variable bindir
- In the directory defined in environment variable windir
Run the relevant setup-*.exe , and select «the_silver_searcher» in the «Utils» category.
Building from source
Install dependencies (Automake, pkg-config, PCRE, LZMA):
Источник
Linux/Reading the Linux Kernel Sources
Contents
Reading the Linux Kernel Sources [ edit | edit source ]
Part of an ongoing Linux Kernel exploration by SVLUG — the Silicon Valley Linux Users Group
Good places to follow along:
Look out for this: sometimes code gets moved from one file or one directory to another. or converted from raw assembler (in the arch area) into C and categorized. Therefore the LXR site letting you surf the older source trees can be very interesting. A specific example of this is the eventual combining of several closely related architectures into one arch with sub-architectures.
- Wikipedia handy words
- Linux kernel
- Extended memory
- computer interrupt
- computer pointers
- the C programming language
- Potentially useful books
- Assembly Language Step by Step by Jeff Duntemann (Wiley 2000) ISBN0471375233 — section on the gas assembly language syntax (Chapter 12), INTs used in Linux (Chapter 13), etc.
- Linux Device Drivers, 3rd edition, Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman (O’Reilly 2005)
- ISBN0-596-00590-3 — covers device drivers for kernel 2.6.10. Full text available for free at LWN
- See also Ralf Brown’s x86 interrupt list
Introduction to the Linux Kernel Source [ edit | edit source ]
Traditional computer science programming courseware, for the most part, uses source code examples that are over-simplified and academic, giving little insight into how things work in the real world, and into the forces that change source code over time.
The Linux kernel is one of the most widely used pieces of production source code in the world today. It’s been under continuous development for over two decades, and has seen ever-growing popularity and usage. In that respect, it’s one of the more seminal works of source code for any aspiring engineer to study.
Studying the Linux kernel sources poses a number of challenges. The kernel is comprised of over 15 million lines of code. In addition, the kernel is a unique «program» — one which provides the most fundamental support for the system’s hardware, libraries and the applications which run under it. So studying a kernel is considerably more complicated than studying a typical user space C program. We can’t just start at int main (int argc, char **argv) . ‘ and proceed from there.
The Linux kernel has a number of different distinct entry points. It also contains a small amount of assembly code required for firmware loads and kernel boot sequences. Naturally the Linux kernel cannot link with the standard C libraries (since, at their core the C libraries depend on systems calls which are provided by the kernel — so that would present a classic chicken-and-egg problem).
Here we’ll present a number of tips for studying the kernel sources. We’ll then provide an outline for one sequence of topics which can be studied in a progressive and productive way. The outline will be primarily presented as links via Cross-referencing Linux
We’ll also rely heavily on other online resources such as the LDP Kernel Hacker’s Guide (which, although quite dated can give us some historical perspective on the code from a time when the kernel was somewhat smaller and simpler).
Get Linux first release 0.01
Where the CPU Starts Executing [ edit | edit source ]
One approach would be to start where the CPU starts executing a freshly loaded kernel image.
Naturally we could ask ourselves: «what is the first part of the compiled kernel that will be executed?» Another way of asking this is: «what is the kernel’s ‘entry point?'» (Unfortunately search on the obvious Google terms: «Linux kernel ‘entry point'» leads to discussions of how system calls transition from user space into kernel space — which is a different sort of «entry point» (though one well worth studying)).
In order to answer that question we have to consider how a kernel gets loaded into memory and started on a system. When a computer starts up the system memory (RAM) is empty; the only available operating instructions are in ROM (read-only memory) — also called the firmware.
Naturally the firmware is different for each system architecture. Thus PCs, PowerPC (older Machintosh systems, and IBM POWER, RS/6000 workstations, etc), DEC Alpha, MIPS, SPARC, and other systems each have their own low-level code — and their own loading conventions. Any given compilation of the kernel will have at least one such entry point.
It’s even more complicated than that, however, because there are, in some cases such as the PC, a number of different ways that the kernel can be loaded. For example from hard disk, floppy, CD-ROM or over a network (e.g. via PXE). Older Linux kernels, (before circa 2003) could be dumped onto a raw floppy diskette (using the dd command) and booted from there. We see where that code used to be here:
As the error message here indicates this raw boot model is no longer supported. So we have to use some program like syslinux, lilo, or grub to load the kernel and jump into the code.
Another complication is that the normal PC Linux kernel image is compressed. There is a small header of code which does some minimal memory management and decompresses the rest of the image into RAM. Then that bit of code jumps into the kernel. The PC (x86) code for decompressing a kernel image can be found in . /arch/i386/boot/compressed/head.S, with other architectures’ decompression code found by simply swapping out i386 for your architecture of choice.
Of course, this is so tied to the architecture that it is in assembly language making it somewhat more difficult for C programmers to read. There are a few comments at the beginning of this file which reveal a little about the expectations/assumptions that the code must make about how controls has been passed to it. These become requirements for how the bootloader (syslinux, lilo, grub) must prepare things before the processor jumps to the instruction after the startup_32 label. It uses these arguments to find and decompresses the kernel into the correct place and then jumps to the «correct place,» which the code only refers to using a register.
According to the comments, the point that the decompression code jumps to is found in arch/$YOUR_ARCH/kernel/head.S and is marked with the macro ENTRY(stext). Again, the trail fades out. Calculated values are jumped to and macros are invoked. The next hint can be found in the Linux Kernel 2.4 Internals page: after the initial assembly, execution jumps to start_kernel() in init/main.c. From there the code becomes much easier to read.
Is this correct? Is there any point in the Linux kernel sources called before this point when loading a compressed kernel? The filename head.S is the convention used for this meaning. startup_32 is the code called by setup.S which does the transition into protected mode. setup.S also #includes video.S and seems to call into the subroutines defined there. Is there currently any way to load an uncompressed Linux kernel on a 32-bit PC? How about under x86_64? Is it possible to step through the code with a JTAG based source level debugger in our weekly meeting? Even though this may affect timing, it can tell us what code is actually executed, the calling sequence and the values of variables. It is possible to step through the code with a JTAG based source level debugger on some embedded Linux device that boots Linux directly from Flash? (That keeps us from getting bogged down in the details of the MBR/bootloader, which is unnecessarily complicated on many x86 machines). Perhaps b: Tomato_(firmware) ?
Where to Therefrom? [ edit | edit source ]
Even if we choose to pursue this path of exploring the kernel . where does the chain of execution go from there? At the end of head.S we see the code clear the EBX register and jump to the address which is contained at that point in the EBP (base pointer) register. The preceding code was calculating possible offsets between the kernel’s compiled in load/start address and the location at which it was actually loaded (if it was built as «relocatable»).
Notice that the jump is not to a symbolic label, and the comments in the sources don’t tell us where to read next.
It should be obvious that starting our reading where the CPU starts executing might present some challenges that we’re not quite ready to tackle.
Where «User Space» is Started [ edit | edit source ]
Every good Linux systems administrator should know that the classic UNIX kernel starts exactly one normal user space program. So, perhaps this is a useful place to start reading the kernel sources. We know that several things have to happen before init is started (the root filesystem must be located and mounted, the initial console must be opened and connected to file descriptors 0, 1, and 2 (stdin, stdout, and stderr respectively) and the initial environment must be created.
Each pre-requisites has it’s own pre-requisites: the block device on which the rootfs is hosted must be detected and initialized, the memory limits must be scanned (or otherwise auto-detected), any memory management unit (MMU) and programmable interrupt controllers (PICs, APICs, IOAPICs) must be detected, enumerated and programmed; etc.
So we could find where the init process is created and trace backwards from there to learn more about how the system is prepared for its ultimate mission (running our programs, one would think).
We can also ask ourselves what happens after the init process has been started. Of course any competent sysadmin knows what happens out in user space: init reads /etc/inittab and executes all of the processes described there.
From that we can intuit at least some of what the kernel must be doing.
Clearly the scheduler must be running, giving the init process and each of its descendants time to execute code in user space.
Any good UNIX programmer knows that code in user space can only do some very basic operations — basically computation, arithmetic and string operations, on memory that’s already allocated to the process. Everything else involves access to files, devices, or other system calls (requests to the kernel to do provide a service to the program).
So we can also look forward, finding out how the kernel handles system calls, and how it provides filesystems.
We can see here that the kernel attempts to start /sbin/init, /etc/init, /bin/init and finally falls back to trying /bin/sh. If we read backwards a little bit we see where the kernel tries to execute whatever program was based via the init= kernel argument by the boot loader (and we can follow the hyperlinks to find where the command line was parsed for such an option). Going back a little further we can see where the kernel tries to start the /init (formerly the /linuxrc) if there was one in an initial RAM disk.
We can compare the current version of this file to the oldest one available (from the 1.0.9 kernel version) and see how much is recognizable, and then consider the changes that have accrued over the years, and ponder why those changes where made.
After init Has Been Started [ edit | edit source ]
A traditional UNIX kernel only starts one normal user space process, init and thereafter it assumes its roles as the mediator between user space and the system hardware resources.
Show how the kernel spawns modprobe and hotplug utilities as a counterpoint to the traditional UNIX model
Primarily the kernel schedules CPU time to processes, dispatches signals to them, and handles system calls for them. That’s the view of the kernel from the perspective if the applications programmer.
Show how the kernel handles system calls via entry.S et all Compare this to the sysenter VDSO technique Other memory mapping tricks?
From another perspective the kernel services interrupts — hardware events.
The first and most obvious would the periodic events from the system clock — on a PC those from from a PIT (programmable interrupt timer). The system clock interrupt becomes the heartbeat of the system. During this interrupt the kernel updates the «jiffies» value, possibly updates the kernel real-time clock values, and schedules user space processes and kernels tasks.
(Most interrupts for most I/O devices are divided into a couple of parts, traditionally called the «bottom half» and the «top half» — the bottom half is normally the minimal amount of work that saves enough state and status that will allow the rest of the work to be deferred until the top half can be scheduled to run — so these are one source of kernel tasks that can require scheduling; additionally the Linux kernel maintains a number of kernel threads which appear in normal ps listings as processes (with funny names that are enclosed in square brackets. those exist in that form so that the kernel can schedule them using the same mechanisms as for any user space processes).
The above top half/bottom half explanation is the reverse of what the Linux Device Drivers book says. 🙁 That is, the top half is normally the minimal amount of work that saves enough state and status that will allow the rest of the work to be deferred until the bottom half can be scheduled to run. I see that w: Talk:Interrupt_handler#Wrong also mentions that some people use top/bottom with a swapped meaning of other people. Perhaps, here at Wikiversity, we should use the unambiguous «first level interrupt handler» and «second level interrupt handler» terminology. Find the code responsible for handling the system clock interrupt! Find one of the simple kernel thread tasks, such as krecoveryd and show how it’s initialized and what it does during it’s time slices Discuss the idle task
End of the Road [ edit | edit source ]
While it’s useful to start, in some sense, where the user space interactions with the kernel «start» — it’s also useful to have some idea how it all will end. Once the kernel has started and settles into its role, handling system calls, dispatching signals, servicing interrupts, it will do so until .
Источник