Using Text Manifests to Load in Sample Patient Information

         August 3, 2023

The ability to import patient-level information into VarSeq and VSClinical can not only save time while evaluating a sample but can be a critical step in automating project creation through VSPipeline. Here we will review how to construct a basic text manifest and how to apply that manifest to a project. Those fields can then be rendered into the final clinical report.

What goes into a text manifest:

There are many different fields that can be brought into the final report via the text manifest, including those that are germline or somatic in nature. For any of these fields, there is the ability to manually add the information into the VSClinical evaluation, which may be a viable option if you are processing a small number of samples at once. When processing many samples, however, a text manifest can both speed up the process and help keep the information between samples clear. Here we can review some of the fields found in an ACMG-based evaluation (Figure 1). These include the sample level details, including the Sample ID, the collection method, collection dates, and more.

Figure 1: Sample detail fields in VSClinical.
Figure 1: Sample detail fields in VSClinical.

You will also have the ability to bring in patient-specific information, such as the patient name, which usually is different than the sample ID (Figure 2). Fields like the date of birth or indication for testing may also be important for the final report.

Figure 2: Patient information fields in VSClinical.
Figure 2: Patient information fields in VSClinical.

The patient details may include fields such as the Ordering Physician, the Facility, or Additional for other notes needed for the final report (Figure 3).

Figure 3: Patient details fields in VSClinical.
Figure 3: Patient details fields in VSClinical.

Finally, specific to the ACMG analysis, there is a section for family information (Figure 4). This is especially helpful in the event of running a Trio analysis, where the parent’s information will be added to the final report.

Figure 4: Family information fields in VSClinical.
Figure 4: Family information fields in VSClinical.

For a somatic workflow, there are many tumor-specific fields that can be brought in on a manifest, including tumor type, purity of the tumor, and more. When it comes to bringing in information on a text manifest to include in VSClinical, and then the final report, know that you are not limited to just the fields available shown here. If there is information that can be presented in a text tab-delimited format with report customization, it can likely be brought into that final report.

Creating a Text Manifest:

There are several different ways to make a text manifest. A fairly straightforward method is by using Excel, as shown below. When crafting the manifest, include only the specific field headers. For example, in Figure 5, there is an overarching field called ‘Sample Info.’ This field is unnecessary and will cause an error in VarSeq on import. The fields that are needed are seen on line two, such as the sample name, affection status, and sex of the patient.

Figure 5: Example of an unnecessary field (line 1) in a text manifest.
Figure 5: Example of an unnecessary field (line 1) in a text manifest.

Here is an example of a slightly more fleshed-out text manifest, where we have included the patient names and corresponding dates of birth (Figure 6). Starting with this simple text manifest, let’s walk through the next steps for getting that information into the VarSeq project.

Figure 6: Example of a functional manifest in Excel.
Figure 6: Example of a functional manifest in Excel.

First, we need to save it in the appropriate file format. VarSeq will not accept a file format in ‘excel workbook’ format; it will only take a text document in CSV UTF-8 (comma delimited) (Figure 7), as seen below.

Figure 7: Saving the manifest to CSV UTF-8 format.
Figure 7: Saving the manifest to CSV UTF-8 format.

When saving into this format with Excel, you will get a warning that data loss can happen in CSV format (Figure 8). For our purposes, you can ignore this warning and save your manifest.

Figure 8: Ignore the data loss warning for this purpose.
Figure 8: Ignore the data loss warning for this purpose.

Now that you have a text manifest describing data for your patients, the key is to add the manifest to your project. You can do this while importing VCFs to your project for the first time with the Add Sample Fields: From Text File option (Figure 9), or if your project has already been created, you can add these fields after the fact by going to File > Set Sample Fields (Figure 10), as seen below.

Figure 9: Adding sample fields through the text file option.
Figure 9: Adding sample fields through the text file option.
Figure 10: Accessing the additional sample fields options.
Figure 10: Accessing the additional sample fields options.

After clicking on Add Sample Fields > From Text File, you will see a screen similar to the one below (Figure 11). Here you will Browse to the location where you have saved your manifest, and you will look for the CSV format file. Then, click Open.

Figure 11: Adding the manifest from the import samples screen.
Figure 11: Adding the manifest from the import samples screen.

The new fields are now available to be brought in. Make sure to click ‘Select All’ to turn the new fields purple, indicating they will be imported (Figure 12). Then click OK.

Figure 12: Selecting all fields from the import sample information screen.
Figure 12: Selecting all fields from the import sample information screen.

On the next screen, you can scroll to the right, and the new patient-level information will be associated with their specific samples (Figure 13). Hit OK.

Figure 13: Confirming additional fields have added.
Figure 13: Confirming additional fields have added.

Back in VarSeq, your new fields will not be immediately visible in your Samples Table (Figure 14). Make sure to right-click on the Sample Info field and select the samples you wish to view.

Figure 14: Selecting all fields on the Samples Table.
Figure 14: Selecting all fields on the Samples Table.

As you can see, the Sex, the Patient Name, and the Date of Birth has been updated for the five samples in this project (Figure 15).

Figure 15: Additional fields have been added from the text manifest.
Figure 15: Additional fields have been added from the text manifest.

Moving into the VSClinical assessment, you can see the corresponding Patient Name field now has information (Figure 16). This was a fairly simple example, but all fields in this section could be filled in, in addition to custom fields specifically meant for the custom report template.

Figure 16: Additional patient information as seen in VSClinical
Figure 16: Additional patient information as seen in VSClinical

Bringing in sample-specific phenotypes or genes lists:

Now that we have a grasp on a basic text manifest let’s explore a slightly more advanced utility of the text manifest. In VarSeq, there are a number of algorithms that take a sample-by-sample input, as opposed to applying a single input to an entire project. For example, you may create a workflow template that applies the phenotype ‘global developmental delay’ to every sample in the project. This is great if you have a large number of samples that work with ‘global developmental delay,’ but it is not helpful if you have a mix of samples that all have different and specific phenotypes, gene lists, or panels. In this case, you will want to apply the Per Sample algorithms (Figure 17).

Figure 17: The per sample algorithms under Computed Data.
Figure 17: The per sample algorithms under Computed Data.

Here we are looking at the input for Sample PhoRank, where each sample has its own corresponding phenotype (Figure 18). In order to avoid putting in phenotypes for each of these samples at this stage, we can save time by 1) Having a Per Sample Algorithm built into the project template and 2) Building a text manifest that specifies the phenotypes or other per-sample information.

Figure 18: Adding sample-specific phenotypes for use in PhoRank.
Figure 18: Adding sample-specific phenotypes for use in PhoRank.

Addressing Step 1 of this process, we have created a very basic workflow that uses the Sample level PhoRank, where each sample in this project requires its own input (Figure 19).

Figure 19: Example of a simple filter chain utilizing PhoRank.
Figure 19: Example of a simple filter chain utilizing PhoRank.

Next, we can save that workflow as a template, so any time I have new samples, they can undergo the same workflow (Figure 20).

Figure 20: Saving a workflow template.
Figure 20: Saving a workflow template.

When I next start a new project and have selected my VCFs, I can go through the same process as before and associate my Text Manifest on import (Figure 21).

Figure 21: Adding sample-specific fields during project creation.
Figure 21: Adding sample-specific fields during project creation.

After the project finishes annotating and filtering, I can click on the half-white and half-black box, along with the PhoRank column header, and I can see that each sample in this project has its own corresponding HPO term for a PhoRank search.

Figure 21: Confirming unique HPO terms have been associated with each sample.
Figure 21: Confirming unique HPO terms have been associated with each sample.

Through this method, I can take a basic workflow and make it multi-purpose by feeding in the sample-specific phenotypes for use in PhoRank. This method also works with other information, like gene lists. If you need help implementing text manifests into your workflow, or if you would like to talk to us about increasing automation in your workflows, please reach out to support@goldenhelix.com, and we will be happy to assist!

Leave a Reply

Your email address will not be published. Required fields are marked *