Quantcast
Channel: Visual C++ Team Blog
Viewing all 437 articles
Browse latest View live

What’s New for Visual C++ Developers in VS2013 Preview

$
0
0

Since the newer Visual C++ content is not yet live on MSDN, I copied the key bits from the "What's New for Visual C++ Developers" and replicated it below. Note that this post may be removed after the MSDN content has been available for a few weeks.

Thanks for your patience!

Improved ISO C/C++ Standards Support

Compiler

  • Supports the following ISO C++11 language features:
    • Default template arguments for function templates.
    • Delegating constructors.
    • Explicit conversion operators.
    • Initializer lists and uniform initialization.
    • Raw string literals.
    • Variadic templates.
  • rvalue/lvalue Reference Casts. With rvalue references, C++11 can clearly distinguish between lvalues and rvalues.  Previously. the Visual C++ compiler did not provide this in specific casting scenarios. A new compiler option, /Zc:rvalueCast, has been added to make the compiler conformant with the C++ Language Working Paper(see section 5.4, [expr.cast]/1).

The default behavior when this option is not specified is the same as in Visual Studio 2012.

C99 Libraries

  • C99 functionality added to <math.h>.
  • Complex math functions in new header, <complex.h>.
  • Integer type support in new header, <inttypes.h>; includes format string support for "hh".
  • Support for variable argument scanf forms in <stdio.h>. C99 variants of vscanf, strtoll, vwscanf/wcstoll, isblank/iswblank implemented.
  • New conversion support for long long and long double in <stdlib.h>.

Standard Template Library 

  • Support for the C++11 explicit conversion operators, initializer lists, scoped enums, and variadic templates.
  • All containers now support the C++11 fine-grained element requirements.
  • Support for these C++14 features:
    • "Transparent operator functors" less<>, greater<>, plus<>, multiplies<>, and so on.
    • make_unique<T>(args...) and make_unique<T[]>(n)
    • cbegin()/cend(), rbegin()/rend(), and crbegin()/crend() non-member functions.
  • <atomic> received numerous performance enhancements.
  • <type_traits> received major stabilization and code fixes.

Visual C++ Library Enhancements

  • C++ REST SDK is added. It has a modern C++ implementation of REST services. For more information, see C++ REST SDK.
  • C++ AMP Texture support is enhanced. It now includes support for mipmaps and new sampling modes.
  • PPL tasks support multiple scheduling technologies and asynchronous debugging. New APIs enable the creation of PPL tasks for both normal results and exception conditions.

C++ Application Performance

  • Auto-Vectorizer now recognizes and optimizes more C++ patterns to make your code run faster.
  • ARM platform and Atom micro-architecture code quality improvements.
  • __vectorcall calling convention is added. Pass vector type arguments by using the __vectorcall calling convention to use vector registers.
  • New Linker Options. The /Gw (compiler) and /Gy (assembler) switches enable linker optimizations to produce leaner binaries.
  • C++ AMP shared memory support to reduce or eliminate data copying between CPU and GPU.
  • Profile Guided Optimization (PGO) enhancements:
    • Performance improvements from a reduction in the working set of apps that are optimized by using PGO.
    • New PGO for Windows Store app development.

Windows Store App Development Support

  • Support For Boxed Types In Value structs. You can now define value types by using fields that can be null—for example, IBox<int>^ as opposed to int. This means that the fields can either have a value, or be equal to nullptr.
  • Richer Exception Information. C++/CX supports the new Windows error model that enables the capture and propagation of rich exception information across the application binary interface (ABI); this includes call stacks and custom message strings.
  • Object::ToString() Is Now Virtual. You can now override ToString in user-defined Windows Runtime ref types.
  • Support For Deprecated APIs. Public Windows Runtime APIs can now be marked as deprecated and given a custom message that appears as a build warning and can provide migration guidance.
  • Debugger Improvements. Support for native/JavaScript interop debugging, Windows Runtime exception diagnosis, and async code debugging (both Windows Runtime and PPL).

Diagnostics Enhancements

  • Debugger Improvements. Support for async debugging and Just My Code debugging.
  • Code Analysis Categories. You can now view categorized output from the Code Analyzer to help you find and fix code defects.
  • XAML Diagnostics. You can now diagnose UI-responsiveness and battery-usage issues in your XAML.
  • Graphics and GPU Debugging Improvements.
    • Remote capture and playback on real devices.
    • Simultaneous C++ AMP and CPU debugging.
    • Improved C++ AMP runtime diagnostics.
    • HLSL Compute shader trace debugging.

3-D Graphics Enhancements

  • Image Content Pipeline support for pre-multiplied alpha DDS format.
  • Image Editor uses internally pre-multiplied alpha for rendering, and thereby avoids rendering artifacts such as dark halos.
  • Image and Model Editors. User-defined filter creation is now supported in Shader Designer in Image Editor and Model Editor.

IDE and Productivity

The Visual Studio IDE has significant improvements to help you be more productive when you code in C++.

  • Improved Code Formatting. You can apply more formatting settings to your C++ code. By using these settings, you can control new-line placement of braces and keywords, indentation, spacing, and line wrapping. Code is automatically formatted when you complete statements and blocks, and when you paste code into a file. To modify formatting settings, on the menu bar in Visual Studio, choose Tools, Options, expand the Text Editor, C/C++, and Formatting nodes, and then make your changes. You can also use the Quick Launch box to access these options.
  • Brace Completion. C++ code now auto-completes the closing characters that correspond to these opening characters:
    • { (curly brace)
    • [ (square bracket)
    • ( (parentheses)
    • ' (single quote)
    • " (double quote)
  • Additional C++ Auto-completion Features.
    • Adds semicolon for class types.
    • Completes parentheses for raw string literals.
    • Completes multi-line comments (/* */)
  • Find All References now automatically resolves and filters references in the background after it displays the list of textual matches. To disable reference resolution, on the menu bar in Visual Studio, choose Tools, Options, expand the Text Editor, C/C++, and Advanced nodes, and change the Disable Resolving setting under References.

To modify the brace-completion settings, on the menu bar in Visual Studio, choose Tools, Options, expand the Text Editor, C/C++, and General nodes, and then make your changes. You can also change the settings for all Visual Studio languages by expanding the Text Editor, All Languages, and General nodes.

To modify specific C++ settings, on the menu bar, choose Tools, Options, expand the Text Editor, C/C++, and Advanced nodes, and then make your changes.

  • Context-Based Member List Filtering. Inaccessible members are filtered out of the IntelliSense member lists. For example, private members are not displayed in the member list unless you are modifying the code that implements the type. While the member list is open, you can press Ctrl+J to remove one level of filtering (applies only to the current member list window). You can press Ctrl+J again to remove the textual filtering and show every member.
  • Parameter Help Scrolling. The displayed function signature in the parameter-help tooltip now changes based on the number of parameters you've actually typed, rather than just showing an arbitrary signature and not updating it based on the current context. Parameter help also functions correctly when it's displayed on nested functions.
  • Toggle Header/Code File. You can now toggle between a header and its corresponding code file by using a command on the shortcut menu, or a keyboard shortcut.
  • Resizable C++ Project Properties Window.
  • Auto-generation of Event Handler Code in C++/CX and C++/CLI. When you are typing code to add an event handler in a C++/CX or C++/CLI code file, the editor can automatically generate the delegate instance and event-handler definition. A tooltip window appears when event-handler code can be auto-generated.
  • DPI Awareness Enhancement. The DPI Awareness setting for application manifest files now supports the "Per Monitor High DPI Aware" setting.
  • Faster Configuration Switching. For large applications, switching configurations—especially subsequent switching operations—execute much more quickly.
  • Build Time Efficiency -- faster builds. Numerous optimizations and multi-core utilization make builds faster, especially for large projects. Incremental builds for C++ applications that have references to C++ WinMD are also much faster.

Using the 2013 CPU Sampling Profiler to Understand C++ Compiler Optimizations

$
0
0

If you’ve ever profiled an optimized build of a C++ application there is a good chance that you looked at the profiling report and saw some functions missing that you expected to be present so you had to assume that they had been inlined but couldn’t be certain. Likewise, if you’ve ever tried to improve your application’s performance using the Profile Guided Optimization (PGO) feature in Visual Studio, you were likely blind to whether your training data actually had the desired effect.

To help with this, in Visual Studio 2013 we’ve done work to help you understand when the compiler inlines functions and how your PGO training data has translated to the optimized binary. In this post I’ll walk you through how to use the Visual Studio CPU Sampling profiler to understand these two optimizations in your application.

The sample

For the purposes of this post, I’ll be evaluating a very basic application that asks the user for a maximum value, and calculates all prime numbers less than the number provided by the user (sample available here). However, to better represent a large complex application the sample is written as an executable console application that calls a dll containing the logic for calculating the prime numbers.

The main function of the console application appears as

int _tmain(int argc, _TCHAR* argv [])
{
    int max;
    cout << "Pick a maximum value to compute to: ";
    cin >> max;

    vector<int> primes;
    primes = get_primes(max);
    cout << primes.size();
    return 0;
}

The prime number calculation is written in the following way

vector<int> get_primes(int max)
{
    vector<int> primes;
    for (int n = 0; n < max; n++)
    {
        if (is_prime(n)) {
            add_prime(n, primes);
        }
    }
    return primes;
}

bool is_prime(int n)
{
    if (is_base(n))
        return false;
    if (has_factors(n))
        return false;
    return true;
}

bool is_base(int n)
{
    if (n < 2)
        return true;
    return false;
}

bool has_factors(int n)
{
    for (int i = 2; i < n; ++i)
    {
        if (is_factor(n, i))
            return true;
    }
    return false;
}

bool is_factor(int n, int i){
    if ((n % i) == 0)
        return true;
    return false;
}

void add_prime(int n, vector<int> &primes)
{
    primes.push_back(n);
}

Using the profiler

The first thing I am going to do is profile the application to evaluate the performance (for accurate performance results, you should always profile Release builds since Debug builds disable optimizations so things may appear to be a problem in a Debug build that will not be an issue in an optimized build). To do this, I’ll open the new “Performance and Diagnostics” launch page from the Debug menu (Debug -> Performance and Diagnostics). The launch page will direct me to the Performance Wizard.

image

When the Performance Wizard opens, I just need to press “Finish” to launch my project since I’ll be taking all the default settings.

image

After the application launches, I’ll input 100K as the number to calculate primes to and let it go. When the report opens, I can see that get_primes is the last frame in my call stack and shows that 100% of the samples are occurring in this function.

image

So I’m left to speculate that the other functions (e.g. is_prime) were inlined, but I don’t know for certain this is what happened. It is possible that when compiled Release these functions executed fast enough that no samples happened to occur while they were executing.

Understanding compiler inlining

This is where the first functionality improvement for C++ CPU profiling comes in Visual Studio 2013. The C++ compiler has the ability to add the inlining information into the .pdb files during compilation (it is however not on by default since it increases the size of the .pdb). To enable this, navigate to the “Project Properties -> Configuration Properties -> C/C++ -> Command Line” and add “/d2Zi+” to the “Additional Options” field (do this for every project in the solution you want inline information for). 

image

NOTE: /d2Zi+ is not an officially documented flag in the C++ compiler, which means its future support is not guaranteed.

Now, rebuild the application with the Release configuration and profile again. This time when the report opens, navigate to the “Call Tree” view

image

When this opens, right click in any of the column headers and choose “Add/Remove Columns” near the bottom of the context menu. When the “Add/Remove Columns” dialogue opens, choose the five new columns that begin with “Inlined”

image

  1. Inlined Functions: Shows functions that have been inlined into this function
  2. Inlined Inclusive Samples: Shows the number of samples that occurred when inlined code was executing in the current function or a descendent function
  3. Inlined Exclusive Samples: Shows the number of samples that occurred when code inlined into this function was executing
  4. Inlined Inclusive Samples %: Shows the total inlined inclusive samples for this method relative to the total number of inlined samples.
  5. Inlined Exclusive Samples %: Shows the total samples collected in code inlined into this function relative to the total number of inlined samples.

Now when I expand the hot path, I have a column that shows me functions that have been inlined into get_primes and the sample count information for samples occurring in code that was originally in a separate function.

image

[For the purposes of the screenshot above I’ve hidden all of the default columns]

Now if I resize the “Inlined Functions” column to make it wider so I can read all of the functions inlined into get_primes, I will see that is_prime, has_factors, and is_factor have all been inlined into get_primes, so I now know this is why they are not appearing anywhere in my report.

Evaluating profile guided optimization

The next new feature I am going to highlight is how to use the CPU sampling profiler to understand how effective your Profiling Guided Optimization (PGO) training is.

When you use PGO, you train the application by running scenarios you believe will be representative of the way that customers will use the application. During training, PGO records what code is executing and how frequently, then during compilation this information is used to help the compiler make more informed decisions about how to optimize the application.

It is important to emphasize here how important the training phase is. If you make a mistake with the training data (e.g. train for a scenario that users rarely do) then you can actually hurt performance for your customers because PGO optimizes functions that the most time is spent executing for speed, and optimizes functions that are executed infrequently or that little time is spent in for size to help make the binary smaller. So if you never exercise an expensive code path during training, PGO is likely to optimize that code for size. If that code turns out to be in a hot path for the user the performance will actually be worse than had you never tried to PGO it.

With this in mind, the problem is once you collect training data and optimize the application, how can you know that the training data was collected correctly and is representative of the actual use of the application? To demonstrate how the profiler can help with this, I’m going to instrument the sample application for PGO training by right clicking on the “PrimeNumbers” project, and then choosing “Profile Guided Optimization -> Instrument”

image

After I instrument, I choose the “Profile Guided Optimization -> Run Instrumented/Optimized Application” to start collecting my training data. Because the instrumented application will run slower than normal, I’m going to input 10K as the limit. Once the application finishes the training run, I go back to the “Profile Guided Optimization” menu and choose “Optimize” to build the optimized binary based on the training data.

Now that we’ve collected training data and optimized our binary based on it, let’s evaluate how effective the training was at optimizing the application for the hot path. To do this, I’m going to launch the optimized application by right clicking on the PrimeNumbers project, and choosing “Profile Guided Optimization -> Run Instrumented/Optimized Application”. After this launches, I will attach the profiler to it by selecting “Analyze -> Profiler -> Attach/Detach” and selecting PrimeNumbers.exe

image

image

[You can alternately launch the project using the profiler, but because PGO is a different type of build than a standard release you will be warned about rebuilding. You need to select “Do not continue with build” or you will lose the optimized build, then choose “Yes” when the profiler displays a message saying the “Build failed, would you like to launch your performance session anyway?”]

I’ll enter 100K again as the number to calculate and when the profiling session ends, I navigate to the Call Tree view, and right click on a column header and choose “Add/Remove Columns…”. This time I select the “Is PGO” and “PGO Type” columns and move them up to be immediately below the module name in the column order.

image

  1. Is PGO: Specifies whether the function was PGO’d or not
  2. PGO Type: Specifies whether the function was PGO’d for Speed or Size

When I expand the hot path, I see that wmain in PrimeNumbers.exe was PGO’d, but neither of the functions in PrimeCalculator were PGO’d. So I can quickly see I failed train PrimeCalculator.dll because I missed instrumenting the .dll before I did the training run. While this is a simple mistake, it illustrates how the profiler can quickly show if your training data was collected correctly.

image

So let’s try again. Using the “Profile Guided Optimization” context menu, I will instrument the PrimeCalculator project, then the PrimeNumbers project, and choose “Run Instrumented/Optimized Build” to collect a new training run. I input 10K again as the limit, and when it finished use the “Profile Guided Optimization” menu to optimize both the PrimeCalculator and the PrimeNumbers projects. Now, I’ll launch the profiler again, and this time when the report comes up, navigate to the Call Tree view. When the hot path is expanded you can see that get_primes is now showing having been PGO’d, and all of the functions I control in my hot path are showing as having been PGO’d.

image

Leveraging the PGO Type column

The example above is a simple example to illustrate how the profile can show you the state of your training data. The more common example where the profiler will come into play is when a customer reports a performance problem. You then either reproduce the problem and profile the scenario, or ask them to collect a performance trace for you. You can then open the report and see whether the hot path in the report is showing your code as primarily PGO type of size, speed, or even PGO’d at all. If the hot path is not showing any functions as profiled for speed that would let you know that you need to update your training to exercise this code path more. If the code path is showing functions profiled for speed then you need to look to make improvements to the implementation rather than relying on PGO to provide desired performance improvements.

At this point it is worth nothing that for very small applications such as the PrimeNumbers sample app, PGO will always optimize all of the functions for speed. An application needs to contain about 6000 instructions before PGO will begin to make size versus speed decisions. To illustrate this below, I’ve PGO’d and then profiled the NBody sample application attached to the “Build faster and high performing native applications using PGO” blog post.

image

Start profiling today

The above was a very basic example of how to use the profiler to understand compiler optimizations, and PGO training. Feel free to download the sample project used in this blog post, and hopefully you can see how you can apply this to your applications by trying the CPU profiler in Visual Studio 2013. Note that the new PGO support in the profiler requires that you profile on Windows 8 or higher, and that the report be viewed in Visual Studio 2013 Premium or higher.

For more on profiling, see the blog of the Visual Studio Diagnostics team, and ask questions in the Visual Studio Diagnostics forum.

Using Visual Studio 2013 to write maintainable native visualizations (natvis)

$
0
0

In Visual Studio 2012 we introduced the ability to create visualizations for native types using natvis files.  Visual Studio 2013 contains several improvements that make it easier to author visualizations for classes that internally make use of collections to store items.   In this blog post I’ll show an example scenario that we wanted to improve, show you what you have to do in VS2012 to achieve the desired results, and then show you how the natvis authoring gets easier with VS2013 by exploring some of our new enhancements.

Example scenario

Let’s consider the following source code and suppose we are interesting in writing a visualizer for the CNameList class:

#include <vector>
using namespace std;

class CName
{
private:
    string m_first;
    string m_last;

public:
    CName(string first, string last) : m_first(first), m_last(last) {}

    void Print()
    { 
        wprintf(L"%s %s\n", (const char*) m_first.c_str(), (const char*) m_last.c_str());
    }
};

class CNameList
{
private:
    vector m_list;

public:
    CNameList() {}

    ~CNameList()
    {
        for (int i = 0; i < m_list.size(); i++)
        {
            delete m_list[i];
        }
        m_list.clear();
    }

    void AddName(string first, string last)
    {
        m_list.push_back(new CName(first, last));
    }


};

int _tmain(int argc, _TCHAR* argv[])
{
    CNameList presidents;
    presidents.AddName("George", "Washington");
    presidents.AddName("John", "Adams");
    presidents.AddName("Thomas", "Jefferson");
    presidents.AddName("Abraham", "Lincoln");

    return 0;
}

Our goal is to get ‘presidents’ to display like this:

 

CNameList visualizer for Visual Studio 2012

In Visual Studio 2012, it can be tricky for some to author the visualization for the CNameList class.  The most obvious natvis authoring:

<?xml version="1.0" encoding="utf-8"?><AutoVisualizer xmlns="http://schemas.microsoft.com/vstudio/debugger/natvis/2010"><Type Name="CNameList"><Expand><ExpandedItem>m_list</ExpandedItem></Expand></Type><Type Name="CName"><DisplayString>{m_first} {m_last}</DisplayString><Expand><Item Name="First">m_first</Item><Item Name="Last">m_last</Item></Expand></Type></AutoVisualizer>
 

Would display like this:

 

While the first and last names of the presidents are there, the view is a lot more cluttered than we may like.  Since we want to visualize the content of the CNameList object, rather than its implementation details, we may not care about the size or capacity of the internal vector, nor about the memory address of each CName object in the list, or the quotes around the first and last names to indicate that they are stored as separate strings.  With Visual Studio 2012, removing this clutter is possible, but it is rather cumbersome, and requires the visualization for CName and CNameList to be coupled with implementation details of the STL.  For example, in VS2012 we could get rid of the size and capacity of the vector, as well as the memory addresses of the CName objects, by replacing the visualizer for CNameList with this:

<Type Name="CNameList"><Expand><IndexListItems><Size>m_list._Mylast - m_list._Myfirst</Size><ValueNode>*m_list._Myfirst[$i]</ValueNode></IndexListItems></Expand></Type>

And we could get rid of the quotes around the first and last names by replacing the CName visualizer with this, which uses the “,sb” format specifier to remove the quotes around the characters in the string: 

<Type Name="CName"><DisplayString Condition="m_first._Myres &lt; m_first._BUF_SIZE &amp;&amp; m_last._Myres &lt; m_last._BUF_SIZE">{m_first._Bx._Buf,sb} {m_last._Bx._Buf,sb}</DisplayString><DisplayString Condition="m_first._Myres &gt;= m_first._BUF_SIZE &amp;&amp; m_last._Myres &lt; m_last._BUF_SIZE">{m_first._Bx._Ptr,sb} {m_last._Bx._Buf,sb}</DisplayString><DisplayString Condition="m_first._Myres &lt; m_first._BUF_SIZE &amp;&amp; m_last._Myres &gt;= m_last._BUF_SIZE">{m_first._Bx._Buf,sb} {m_last._Bx._Ptr,sb}</DisplayString><DisplayString Condition="m_first._Myres &gt;= m_first._BUF_SIZE &amp;&amp; m_last._Myres &gt;= m_last._BUF_SIZE">{m_first._Bx._Ptr,sb} {m_last._Bx._Ptr,sb}</DisplayString><Expand><Item Condition="m_first._Myres &lt; m_first._BUF_SIZE" Name="First">m_first._Bx._Buf,sb</Item><Item Condition="m_first._Myres &gt;= m_first._BUF_SIZE" Name="First">m_first._Bx._Ptr,sb</Item><Item Condition="m_last._Myres &lt; m_last._BUF_SIZE" Name="Last">m_last._Bx._Buf,sb</Item><Item Condition="m_last._Myres &gt;= m_last._BUF_SIZE" Name="Last">m_last._Bx._Ptr,sb</Item></Expand></Type>
 

While these visualizations certainly work in the sense that they yield the desired clutter-free output in the watch window, they require more work to write and maintain.  First, the visualizers for both CNameList and CName now take dependencies on private members of classes in the STL.  As implementation details the STL are subject to change, these visualizers are at risk of not working in a future version of Visual Studio if the STL implementation changes something that these entries depend on.  Furthermore, if CNameList is distributed as a header file that could potentially be included from any version Visual Studio, you might need to include a separate natvis entry for CName, for each implementation of the STL, then have to update all of them, any time in the future that the implementation of CName changes.

Furthermore, when the visualizer for the internal class has conditionals in it, the conditionals end up multiplying in ways that make the visualizer a mess.  For instance, the built-in visualizer for std::basic_string has two possible display string cases: 

<Type Name="std::basic_string&lt;char,*&gt;"><DisplayString Condition="_Myres &lt; _BUF_SIZE">{_Bx._Buf,s}</DisplayString><DisplayString Condition="_Myres &gt;= _BUF_SIZE">{_Bx._Ptr,s}</DisplayString><StringView Condition="_Myres &lt; _BUF_SIZE">_Bx._Buf,s</StringView><StringView Condition="_Myres &gt;= _BUF_SIZE">_Bx._Ptr,s</StringView><Expand><Item Name="[size]">_Mysize</Item><Item Name="[capacity]">_Myres</Item><ArrayItems><Size>_Mysize</Size><ValuePointer Condition="_Myres &lt; _BUF_SIZE">_Bx._Buf</ValuePointer><ValuePointer Condition="_Myres &gt;= _BUF_SIZE">_Bx._Ptr</ValuePointer></ArrayItems></Expand></Type>
 

However, because CName contains both a first name and a last name, there are now four cases, instead of two, based on whether the first and last names are contained in _Bx._Buf or _Bx._Ptr.  If we were to enhance CName to store middle names as well, now the visualizer would be up to eight cases, as the number of cases doubles for each new name you want to display. So we wanted to offer a cleaner way.

CNameList Visualizer for Visual Studio 2013

In Visual Studio 2013, you can achieve an uncluttered view of CNameList in the watch window by writing your visualizer like this:

<?xml version="1.0" encoding="utf-8"?><AutoVisualizer xmlns="http://schemas.microsoft.com/vstudio/debugger/natvis/2010"><Type Name="CNameList"><Expand><ExpandedItem>m_list,view(simple)na</ExpandedItem></Expand></Type><Type Name="CName"><DisplayString>{m_first,sb} {m_last,sb}</DisplayString><Expand><Item Name="First">m_first,sb</Item><Item Name="Last">m_last,sb</Item></Expand></Type></AutoVisualizer>

This view visualization works by taking advantage of three new natvis features in Visual Studio 2013: multiple views of an object, a format specifier to suppress memory addresses of pointers, and the ability of format specifiers to propagate across multiple natvis entries. Let’s examine all three next.

Multiple Object Views

In Visual Studio 2012, a <Type> entry can only describe one way to view an object.  For example, because the default view of a vector includes child nodes for the size and capacity, every natvis entry that wants to show a vector must either also include child nodes for the size and capacity or inline the vector visualizer into the visualizer of the type that uses the vector.

In Visual Studio 2013, each type still has only one default view, and it is now possible for a natvis entry to define additional views that can be accessed through an appropriate format specifier.  For example, the Visual Studio 2013 visualization of std::vector does so like this:

<Type Name="std::vector&lt;*&gt;"><DisplayString>{{ size={_Mylast - _Myfirst} }}</DisplayString><Expand><Item Name="[size]" ExcludeView="simple">_Mylast - _Myfirst</Item><Item Name="[capacity]" ExcludeView="simple">_Myend - _Myfirst</Item><ArrayItems><Size>_Mylast - _Myfirst</Size><ValuePointer>_Myfirst</ValuePointer></ArrayItems></Expand></Type>

The <DisplayString> and the <ArrayItems> elements are used always, while the “[size]” and “[capacity]” items are excluded from a view that has a name of “simple”. Normally, objects are displayed using the default view, which will show all the elements. However, the “,view” format specifier can be used to specify an alternate view, as shown this example of a simple vector of integers. Note that “vec,view(xxx)” behaves exactly the same as the default view because the natvis entry for vector does not contain any special behavior for a view of “xxx”.

 

If you want a natvis element to be added to, rather than removed from a particular view, you can use the “IncludeView” attribute, instead of “ExcludeView”. You may also specify a semi-colon delimited list of views in “IncludeView” and “ExcludeView” attributes, if you want the attribute to apply to a set of views, rather than just one. For example, this visualization will show the display text of “Alternate view” using either “,view(alternate)” or “,view(alternate2)”, and “Default View” in other cases.

<Type Name="MyClass"><DisplayString IncludeView="alternate; alternate2">Alternate view </DisplayString><DisplayString>Default View</DisplayString></Type>

So, going back to our example, our CNameList visualizer takes advantage of the “simple” view defined in the “vector” visualizer to eliminate the clutter of the size and capacity nodes:

<Type Name="CNameList"><Expand><ExpandedItem>m_list,view(simple)na</ExpandedItem></Expand></Type>

Skipping Memory Addresses

Visual Studio 2013 adds a new format specifier, “,na”. When applied to a pointer, the “,na” format specifier causes the debugger to omit the memory address pointed to, while still retaining information about the object. For example:

In our CNameList example, we use the “,na” format specifier to hide the memory addresses of the CName objects, which are unimportant. Without the “,na” format specifier, hiding the memory addresses would have required copy-pasting and modifying the visualizer for std::vector to make it dereference the elements inside of the vector, like this:

<Type Name="CNameList"><Expand><IndexListItems><Size>m_list._Mylast - m_list._Myfirst</Size><ValueNode>*m_list._Myfirst[$i]</ValueNode></IndexListItems></Expand></Type>

In our CNameList example, we use the “,na” format specifier to hide the memory addresses of the CName objects, which are unimportant. Without the “,na” format specifier, hiding the memory addresses would have required copy-pasting and modifying the visualizer for std::vector to make it dereference the elements inside of the vector, as illustrated here.

It should also be noted that the “,na” format specifier is not quite the same as the dereferencing operator “*”. Even though the “,na” format specifier will omit the memory address of the data being pointed to, any available symbolic information about that address will still be displayed. For example, in the function case, “*wmain” would be a syntax error, but “wmain,na” shows the module and signature of the “wmain” function, omitting the memory address. Similarly, “&myGlobal,na” still shows you that the pointer is pointing to the symbols “int myGlobal”. The “,na” format specifier can also be used on memory addresses in the middle of a function, as illustrated in the “(void*)eip,na” example. This can make the “,na” format specifier quite attractive for visualizing stack traces that have been logged inside of objects, for debugging purposes.

Propagating Format Specifiers

Even though the “,sb” format specifier already exists in Visual Studio 2012, authoring the CName visualizer like this does not work in VS2012:

<Type Name="CName"><DisplayString>{m_first,sb} {m_last,sb}</DisplayString><Expand><Item Name="First">m_first,sb</Item><Item Name="Last">m_last,sb</Item></Expand></Type>

The reason is that “m_first” is not actually a char*, but rather an std::basic_string. Because of this, Visual Studio 2012 actually obtains the format specifier for the underlying char* from the std::basic_string visualizer, not the CName visualizer. While the use of “m_first,sb” is still legal syntax, under Visual Studio 2012, the “,sb” in CName’s visualizer actually gets completely ignored.

In the meantime, because the visualizer for std::basic_string is designed to work for the common case, the std::basic_string uses “,s”, not “,sb”, causing the quotes to be included. Hence, while your intention was to get stripped out, they are actually still there . In Visual Studio 2012, the only workaround without changing std::basic_string, and potentially messing up other visualizations, not related to CName, is to inline the contents of std::basic_string into CName’s visualizer, so the string being used with “,sb” is actually a direct char*, rather than an std::basic_string.

In Visual Studio 2013, format specifiers used on visualized objects get merged with format specifiers of the object’s visualizer itself, rather than getting thrown out. In other words, in Visual Studio 2013, the “b” in “m_first,sb” propagates to the strings shown in the std::basic_string visualizer, allowing the quotes to be nicely and easily stripped out, without needing to modify or inline the visualizer for std::basic_string.

Another example of propagation of format specifiers is our new visualizer for CNameList. Even if the “,na” format specifier did exist in Visual Studio 2012, without the propagation of format specifiers, “m_list,na” would still not work, as the “,na” would simply be ignored due to std::vector’s visualizer not using “,na”. In Visual Studio 2013, the “,na” format specifier automatically propagates to the elements of the vector and things just work.

Yet another good example of propagation of format specifiers is displaying the elements of an integer collection in hexadecimal. The “,x” format specifier to display an integer in hex is already present in Visual Studio 2012, but only when applied directly to an integer value. When applied to a vector object, Visual Studio 2012 will simply ignore it, like this:

In Visual Studio 2012, showing the vector elements in hex would have required either modifying the visualizer for std::vector so that every vector would have its elements in hex, or toggling the global “hexadecimal display” option, which would cause every watch item to be formatted in hex, not just that one vector.

In Visual Studio 2013, “,x” simply propagates down to the children of the vector automatically, like this:

Other Visualization improvements

While the above features are all that is necessary to make our CNameList example work, there are a few other natvis-related improvements that have been asked for and are worth mentioning as well:

Using the final implementation name of a class inside a display string:

In Visual Studio 2013, a natvis entry for a base class may make use of the name of the object’s implementation class the $(Type) macro inside of a element. For example, if we have this source code:

class Room
{
private:
    int m_squareFeet;

public:
    Room() : m_squareFeet(100) {}

    virtual void Print() = 0;
};

class Bedroom : public Room
{
public:
    virtual void Print() { printf("Bedroom"); }

};

class LivingRoom : public Room
{
public:
    virtual void Print() { printf("Living room"); }

};

class DiningRoom : public Room
{
public:
    virtual void Print() { printf("Dining room"); }
};

int _tmain(int argc, _TCHAR* argv[])
{
    Bedroom br;
    LivingRoom lr;
    DiningRoom dr;

    br.Print();
    lr.Print();
    dr.Print();
}

We can write one visualizer for class “Room” like this:

<?xml version="1.0" encoding="utf-8"?><AutoVisualizer xmlns="http://schemas.microsoft.com/vstudio/debugger/natvis/2010"><Type Name="Room"><DisplayString>{m_squareFeet}-square foot $(Type)</DisplayString></Type></AutoVisualizer>

that will display which type of room we have, like this:

In Visual Studio 2012, achieving this display would have required creating a separate <Type> element for each type of room.

Note that the use of $(Type) is case-sensitive. $(TYPE) will not work. Also, the use of $(Type) requires the base class to contain at least one virtual function because, without the existence of a vtable, the debugger has no way to know what the object’s implementation class actually is.

Support for Circular Linked Lists

In Visual studio 2013, the <LinkedListItems> element adds support for circular lists that point back to the head of the list to indicate termination. For example, with the following source code:

class CircularList
{
private:
    struct Node
    {
        int m_value;
        Node* m_pNext;
    };

    Node* m_pFirst;

    Node* GetTail()
    {
        if (!m_pFirst)
            return NULL;

        Node* pNode = m_pFirst;
        while (pNode->m_pNext != m_pFirst)
            pNode = pNode->m_pNext;

        return pNode;
    }
public:
    CircularList() : m_pFirst(NULL) {}

    ~CircularList()
    {
        Node* pNode = m_pFirst;
        while (pNode != m_pFirst)
        {
            Node* pNext = pNode->m_pNext;
            delete pNode;

            pNode = pNext;
        }
    }

    void AddTail(int i)
    {
        Node* pNewNode = new Node();

        if (m_pFirst)
            GetTail()->m_pNext = pNewNode;
        else
            m_pFirst = pNewNode;

        pNewNode->m_value = i;
        pNewNode->m_pNext = m_pFirst;
    }
};

int _tmain(int argc, _TCHAR* argv[])
{
    CircularList list;
    list.AddTail(1);
    list.AddTail(2);
    list.AddTail(3);

	return 0;
}

We can display the value of ‘list’ with a simple element, like this:

<?xml version="1.0" encoding="utf-8"?><AutoVisualizer xmlns="http://schemas.microsoft.com/vstudio/debugger/natvis/2010"><Type Name="CircularList"><Expand><LinkedListItems><HeadPointer>m_pFirst</HeadPointer><NextPointer>m_pNext</NextPointer><ValueNode>m_value</ValueNode></LinkedListItems></Expand></Type></AutoVisualizer>

In Visual Studio 2013, the output is this:

In Visual Studio 2012, the list would be assumed to go on forever, since the “next” pointer of a list node is never NULL, thereby producing output like this:

Circular list support in Visual Studio 2013 only applies when an element’s “next” pointer points directly back to the head of the list. If a “next” pointer points back to a prior node in the list, that is not at the head, Visual Studio 2013 will still treat the list as continuing forever, just as Visual Studio 2012 does.

Because the “next” pointer expression does have access to fields in the underlying list object, the is no reasonable workaround for Visual Studio 2012.

In Closing

Visual Studio 2013 seeks to address the most common cases in which deficiencies in the natvis framework require compromises in the quality of the visualized view of an object and/or the maintainability of the natvis entries behind it. For feedback on this or any other diagnostics-related features please visit our MSDN forum and I also look forward to your comments below.

C++11/14 STL Features, Fixes, And Breaking Changes In VS 2013

$
0
0

I'm Stephan T. Lavavej, and for the last six and a half years I've been working with Dinkumware to maintain the C++ Standard Library implementation in Visual C++.  It's been a while since my last VCBlog post, because getting our latest batch of changes ready to ship has kept me very busy, but now I have time to write up what we've done!

 

If you missed the announcement, you can download 2013 Preview right now!

 

Note: In this post, whenever I say "2013" without qualification, I mean "2013 Preview and RTM".  I will explicitly mention when things will appear between Preview and RTM.

 

Quick Summary

The STL in Visual C++ 2013 features improved C++11 conformance, including initializer lists and variadic templates, with faster compile times.  We've also implemented features from the upcoming C++14 Standard, including make_unique and the transparent operator functors.

 

Compiler Features

As a reminder, here are the C++11 Core Language features that have been added to the compiler in Visual C++ 2013 Preview (in addition to the features in Visual C++ 2012, of course):

 

* Default template arguments for function templates

* Delegating constructors

* Explicit conversion operators

* Initializer lists and uniform initialization

* Raw string literals

* Variadic templates

 

They were available in the "Visual C++ Compiler November 2012 Community Technology Preview" (and they were covered in my video walkthrough), but the compiler team has made lots of progress since then.  That is, tons and tons and tons of bugs have been fixed - many of which were reported by users of the Nov CTP (thanks!).

 

As Herb Sutter just announced in "The Future Of C++" at the Build conference (with a companion blog post), more C++11 Core Language features will be implemented in 2013 RTM:

 

* Alias templates

* Defaulted functions (except for rvalue references v3)

* Deleted functions

* Non-static data member initializers (NSDMIs)

 

(About that caveat: "rvalue references v3" is how I refer to C++11's rules for automatically generating move constructors and move assignment operators, plus the rules for suppressing automatically generated copies when moves are declared.  There simply isn't enough time between 2013 Preview and RTM for the compiler team to implement this with RTM-level quality.  As a consequence, requesting memberwise move constructors and move assignment operators with =default will not be supported.  The compiler team is acutely aware of how important this stuff is, including to the STL, and it is one of their highest priorities for post-2013-RTM.)

 

Additionally, some C99 Core Language features will be implemented in 2013 RTM:

 

* C99 _Bool

* C99 compound literals

* C99 designated initializers

* C99 variable declarations

 

Please see Herb's announcement video/post for more information, especially a post-2013-RTM C++11/14 conformance roadmap, including (among many other things) rvalue references v3, constexpr, and C++14 generic lambdas.

 

STL Features

The STL in Visual C++ 2013 Preview has been fully converted over to using the following C++11 features:

 

* Explicit conversion operators

* Initializer lists

* Scoped enums

* Variadic templates

 

In 2013 RTM, this list will be extended to:

 

* Alias templates

* Deleted functions

 

I say "converted over" because the STL has been faking (I suppose "simulating" would be a more dignified word) some of these features with library-only workarounds, with varying degrees of success, ever since Visual C++ 2008 SP1.  Going into further detail:

 

* In the Core Language, explicit conversion operators are a general feature - for example, you can have explicit operator MyClass().  However, the Standard Library currently uses only one form: explicit operator bool(), which makes classes safely boolean-testable.  (Plain "operator bool()" is notoriously dangerous.)  Previously, we simulated explicit operator bool() with operator pointer-to-member(), which led to various headaches and slight inefficiencies.  Now, this "fake bool" workaround has been completely removed.

 

* We didn't attempt to simulate initializer lists, although I accidentally allowed a broken/nonfunctional <initializer_list> header to ship in Visual C++ 2010, confusing users who noticed its presence.  It was removed in Visual C++ 2012 to avoid confusion.  Now that the compiler supports initializer lists, <initializer_list> is back, it actually works, and we've implemented all of the std::initializer_list constructors and other functions mandated throughout the Standard Library.

 

* Scoped enums were implemented in the Visual C++ 2012 compiler, but due to a long story involving a compiler bug with /clr, we weren't able to use them in the STL for that release.  In Visual C++ 2013, we were able to get rid of our fake scoped enums (simulated by wrapping traditional unscoped enums in namespaces, which is observably imperfect).

 

* Over the years, we've simulated variadic templates with two different systems of "faux variadic" preprocessor macros - the first system involved repeatedly including subheaders, while the second system (more elegant, as far as crawling horrors go) eliminated the subheaders and replaced them with big backslash-continued macros that were stamped out by other macros.  Functions that were supposed to be true variadic, like make_shared<T>(args...), were actually implemented with overloads: make_shared<T>(), make_shared<T>(arg0), make_shared<T>(arg0, arg1), etc.  Classes that were supposed to be true variadic, like tuple<Types...>, were actually implemented with default template arguments and partial specializations.  This allowed us to bring you make_shared/tuple/etc. years ago, but it had lots of problems.  The macros were very difficult to maintain, making it hard to find and fix bugs in the affected code.  Spamming out so many overloads and specializations increased compile times, and degraded Intellisense.  Finally, there was the infinity problem.  We originally stamped out overloads/specializations for 0 to 10 arguments inclusive, but as the amount of variadic machinery increased from TR1 through C++0x's evolution to C++11's final form, we lowered infinity from 10 to 5 in Visual C++ 2012 in order to improve compile times for most users (we provided a way to request the old limit of 10 through a macro, _VARIADIC_MAX).

 

In Visual C++ 2013, we have thoroughly eradicated the faux variadic macros.  If you want tuples with 50 types, you can have them now.  Furthermore, I am pleased to announce that switching the STL over to real variadic templates has improved compile times and reduced compiler memory consumption.  What usually happens during the STL's development is that the Standard tells us to implement more stuff, so we do, and more stuff takes more time and more memory to compile (which can be mitigated by proper use of precompiled headers).  Then the compiler's testers notice the reduced performance and complain, and I respond with a stony face and a clear conscience, because the Standard told me to.  But in this case, faux variadics were so incredibly bloated that removing them made a big observable difference.  (Parsing one variadic template isn't free, but it's a lot cheaper than parsing 6 or whatever overloads.)

 

The precise numbers will change as the compiler, the STL, and its dependencies (notably the CRT and Concurrency Runtime) are modified, but when I checked in the final variadic rewrite, I ran some tests and I can share the concrete numbers.  My test included all STL headers in x86 release mode.  I didn't actually use anything, but the compiler still has to do a lot of work just to parse everything and also instantiate what the STL uses from itself, so this is a reasonable way to measure the overhead of just dragging in the STL without using it extensively.  Comparing Visual C++ 2012 (which defaulted to _VARIADIC_MAX=5) to Visual C++ 2013, preprocessed file size decreased from 3.90 MB to 3.13 MB.  Compile time decreased from 2307 ms to 2029 ms.  (I measured only the compiler front-end; the back-end took 150-250 ms without optimizations and it isn't of interest here.  Additionally, for complicated reasons that I'll avoid going into here, I excluded the relatively brief time needed to generate a preprocessed translation unit - but as you can see, counting that time would just increase the differences further.)  Finally, I measured compiler memory consumption by generating a precompiled header (a PCH is a compiler memory snapshot), which also decreased from 40.2 MB to 33.3 MB.  So, the STL is smaller and faster to compile according to every metric.  If you had to increase _VARIADIC_MAX to get your programs to compile with Visual C++ 2012, the improvements will be even more dramatic (for completeness, I measured _VARIADIC_MAX=10 as 6.07 MB, 3325 ms, and 58.5 MB respectively).

 

(Note: After I performed these measurements at the end of March 2013, the compiler team has significantly reworked some of their internal data structures.  As a result, compiler memory consumption in Preview and especially RTM may significantly differ from my original measurements.  However, that doesn't affect my conclusion: considered in isolation, using real variadic templates improves every metric.)

 

* The Standard requires a few things in the STL to be alias templates.  For example, ratio_add<ratio<7, 6>, ratio<11, 10>> is required to be an alias for ratio<34, 15>.  Previously, we simulated alias templates with structs containing "type" typedefs (sometimes named "other").  This workaround will be completely eliminated in 2013 RTM.

 

* The Standard declares over 100 deleted functions throughout the STL, usually to make classes noncopyable.  Previously, we simulated this with the traditional C++98/03 workaround: private unimplemented declarations.  In 2013 RTM, we will eliminate this workaround, marking free functions as =delete and making member functions public with =delete as mandated by the Standard (except type_info, for horrible reasons which this margin is too narrow to contain).

 

You may have noticed above that there are some features in the compiler list that aren't in the library list.  The reason is simple: not all Core Language features have an observable impact on the Standard Library.  For example, raw string literals don't require corresponding Standard Library implementation changes (although they make <regex> more convenient to use).  Similarly, uniform initialization (excluding initializer lists) and NSDMIs don't affect the STL.  You can say pair<int, int> p{11, 22}, but that doesn't involve any new or changed machinery on our side.  As for defaulted functions, they usually appear in the STL for moves, which (as I explained earlier) is not yet supported by the compiler.  There are a few defaulted functions other than moves, we just haven't updated them yet because implementing them "normally" has few adverse consequences.  Finally, delegating constructors and default template arguments for function templates don't observably affect the Standard Library's interface, although we're using them internally.  (Interestingly, their internal use is actually necessary - delegating constructors are the only way to implement pair's piecewise constructor, and default template arguments for function templates are the only way to constrain tuple's variadic constructor.)

 

You may also have noticed that I mentioned C++14 in the title.  That's because we've implemented the following Standard Library features that have been voted into the C++14 Working Paper:

 

* The "transparent operator functors" less<>, greater<>, plus<>, multiplies<>, etc.

* make_unique<T>(args...) and make_unique<T[]>(n)

* cbegin()/cend(), rbegin()/rend(), and crbegin()/crend() non-member functions

* The <type_traits> alias templates make_unsigned_t, decay_t, etc. (implemented in RTM but not Preview)

 

(Please note that there is an FAQ at the bottom of this post, and "Q1: Why are you implementing C++14 Standard Library features when you haven't finished the C++11 Core Language yet?" is answered there.)

 

For more information, you can read my proposals N3421 "Making Operator Functors greater<>" (Standardese voted in), N3588 "make_unique" (background), N3656 "make_unique (Revision 1)" (Standardese voted in), and my proposed resolution for Library Issue 2128 "Absence of global functions cbegin/cend" which was accepted along with the rest of N3673 "C++ Library Working Group Ready Issues Bristol 2013".  (A micro-feature guaranteeing that hash<Enum> works was also voted in, but it was implemented back in Visual C++ 2012.)

 

Walter E. Brown (who is unaffiliated with Microsoft) proposed the <type_traits> alias templates in N3546 "TransformationTraits Redux", which is unquestionably the best feature voted into C++14.  Anyone who dares to question this assertion is hereby sentenced to five years of hard labor writing typename decay<T>::type instead of decay_t<T>.

 

Additionally, James McNellis (who now maintains our CRT) used the transparent operator functors to reduce <algorithm>'s size by 25%.  For example, sort(first, last) and sort(first, last, comp) previously had nearly-duplicate implementations, differing only in how they compared elements.  Now, sort(first, last) just calls sort(first, last, less<>()).  This cleanup was later extended to algorithms implemented in various internal headers, plus the ones lurking in <numeric> and the member algorithms in <list> and <forward_list>.  (A small part of the compile time improvement can probably be attributed to this.)  James also contributed the fine-grained container requirements overhaul and several bugfixes mentioned below - thanks James!

 

Fixes

That's it for the features.  Now, for the fixes.  This time around, I kept track of all of the STL fixes we made between Visual C++ 2012 and 2013.  This includes both the bugs that were reported through Microsoft Connect, and the bugs that were reported internally (whether filed by myself when I notice something wrong, by our testers, or by other Microsoft teams).  My list should be exhaustive, with a few exceptions: I'm not counting bugs filed against our tests when they didn't involve broken library code, I've tried to omit the (happily, few) bugs that were introduced and then fixed after 2012 shipped, and I'm not counting bugs that were reported against the libraries but were actually resolved in the compiler (e.g. a few type traits bugs were actually bugs in the compiler hooks we depend on).  Also, only formally filed bugs are listed here.  Sometimes we notice and fix bugs that were never filed in our database (e.g. I noticed alignment_of behaving incorrectly for references and fixed that, but nobody had ever reported it).  I'm including our internal bug numbers (e.g. DevDiv#1729) in addition to Connect bug numbers when available so if anyone asks me for further details, I can easily look up the bug.

 

There were a few major overhauls:

 

* <atomic>, whose entire purpose in life is blazing speed, contained unnecessarily slow implementations for many functions.  Wenlei He from our compiler back-end (code generation) team contributed a major rewrite of <atomic>'s implementation.  Our performance on all architectures (x86/x64/ARM) should now be optimal, or very close to it.  (DevDiv#517261/Connect#770885)

 

* <type_traits> was significantly reworked in conjunction with compiler changes.  (Some type traits, like is_array, can be implemented with ordinary C++ code.  Other type traits, like is_constructible, must rely on "compiler hooks" for accurate answers.)  This fixed virtually all known type traits bugs, including is_pod, is_assignable, and other type traits behaving incorrectly for void (DevDiv#387795/Connect#733088, DevDiv#424157), is_scalar behaving incorrectly for nullptr_t (DevDiv#417110/Connect#740872), result_of not working with movable-only arguments (DevDiv#459824), the is_constructible family of type traits behaving incorrectly with references (DevDiv#517460), many type traits being incorrectly implemented in terms of other type traits (DevDiv#520660; for example, is_move_assignable<T> is defined as is_assignable<T&, T&&>, but our implementation wasn't doing exactly that), aligned_union returning insufficiently large types for storage (DevDiv#645232/Connect#781713), and alignment_of emitting spurious warnings for classes with inaccessible destructors (DevDiv#688225/Connect#786574 - fixed in RTM but not Preview).

 

* In C++98/03, life was easy.  STL containers required their elements to be copy-constructible and copy-assignable, unconditionally (C++03 23.1 [lib.container.requirements]/3).  In C++11, as movable-only types like unique_ptr and additional member functions like emplace_back() were introduced, the container requirements were rewritten to be fine-grained.  For example, if you have a list<T> and you populate it with emplace_back(a, b, c), then T needs to be destructible (obviously) and constructible from (A, B, C) but it doesn't need to support anything else.  This is basically awesome for users, but it requires lots of attention to detail from implementers.  James fixed many bugs while overhauling this, but the ones that were formally reported were vector<DefaultConstructible>(10) not compiling (DevDiv#437519), vector<T>(first, last)'s requirements being too strict (DevDiv#437541), container move constructors requiring too much from their elements (DevDiv#566619/Connect#775715), and map/unordered_map<K, V> op[] requiring V to be more than default-constructible (DevDiv#577613/Connect#776500).

 

* 2013 RTM will contain a <ratio> overhaul, implementing alias templates (DevDiv#693707/Connect#786967) and thoroughly fixing numerous bugs (DevDiv#704582/Connect#788745 was just the tip of the iceberg).

 

Then there were individual bugfixes:

 

* std::find()'s implementation has an optimization to call memchr() when possible, but find() was simultaneously calling memchr() too aggressively leading to incorrect results (DevDiv#316853/Connect#703165), and not calling it aggressively enough leading to suboptimal performance (DevDiv#468500/Connect#757385).  We rewrote this optimization to be correct and optimal in all cases.

 

* Operators for comparing shared_ptr/unique_ptr to nullptr_t were missing (DevDiv#328276).  Some comparisons compiled anyways due to the construction of temporaries, but the Standard mandates the existence of the operators, and this is observable if you look hard enough.  Therefore, we added the operators.

 

* shared_future was convertible to/from future in ways prohibited by the Standard (DevDiv#361611).  We prevented such conversions from compiling.

 

* In <random>, subtract_with_carry's streaming operator performed a "full shift" (shifting all of the bits out of an integer in a single operation), which triggers undefined behavior and doesn't actually work on ARM (DevDiv#373497).  We now follow the Standard's rules so this works on all architectures.

 

* Taking the addresses of numeric_limits' static const data members didn't work with the /Za compiler option (DevDiv#376524).  Now it does.

 

* <ostream> provided unnecessary overloads of std::endl (DevDiv#380718/Connect#730916).  While they didn't appear to be causing any problems, we eliminated them, so only the Standard-mandated overloads are present.

 

* As required by the Standard, tuple_element<I, array<T, N>> now enforces I < N (DevDiv#410190/Connect#738181).

 

* <filesystem>'s directory_iterator was returning paths that were too short (DevDiv#411531).  (Note that recursive_directory_iterator worked correctly.)  We fixed directory_iterator to follow N1975, the Filesystem V2 draft.  (Filesystem V3 is on our radar, but it won't be implemented in 2013 RTM.)

 

* <filesystem> contained a handwritten implementation called _Strcpy() (DevDiv#422681).  Now it uses the CRT's string copying functionality, and it internally passes around references to arrays for additional bounds safety.

 

* <ostream>'s op<<() worked with std::hexfloat, but <istream>'s op>>() didn't (DevDiv#425415/Connect#742775).  Now it does.

 

* In a previous release, iostreams was changed to work correctly with the /vd2 compiler option, but this machinery sometimes emitted compiler warnings.  The compiler gained a new pragma, #pragma vtordisp, and iostreams now uses this, avoiding such spurious warnings (DevDiv#430814).

 

* Due to a typo, _VARIADIC_MAX=7 didn't compile (DevDiv#433643/Connect#746478).  We fixed this typo, then later eradicated the machinery entirely.

 

* system_error::what()'s return value didn't follow the Standard (DevDiv#453372/Connect#752770).  Now it does.

 

* codecvt_one_one wouldn't compile without various using-declarations (DevDiv#453373/Connect#752773).  It no longer requires such workarounds.

 

* Due to a misplaced parenthesis, <chrono>'s duration_cast sometimes didn't compile (DevDiv#453376/Connect#752794).  We fixed the parenthesis.

 

* Our tests found an extremely obscure deadlock under /clr when holding the locale lock and throwing an exception (e.g. when a std::locale is constructed from a bogus name).  We rearranged our code to fix this (DevDiv#453528).

 

* On ARM, we realized that we were decrementing shared_ptr/weak_ptr's refcounts in a multithreading-unsafe manner (DevDiv#455917).  (Note: x86/x64 were absolutely unaffected.)  Although we never observed crashes or other incorrect behavior even after focused testing, we changed the decrements to use sequential consistency, which is definitely correct (although potentially slightly slower than optimal).

 

* The Standard doesn't guarantee that tuple_size can be used with things other than tuples (or pairs or arrays).  In particular, tuple_size<DerivedFromTuple> isn't guaranteed to work (DevDiv#457214/Connect#753773).  Instead of silently compiling and returning 0, we changed our implementation to static_assert about such cases.

 

* C++11 says that list::erase() shouldn't invalidate end iterators (DevDiv#461528).  This is a physical consequence of std::list's representation, so release mode was always correct, but our debug checks complained about invalidation.  We fixed them so they now consider end iterators to be preserved.

 

* Constructing a regex with the flags regex::icase | regex::collate resulted in case-sensitive matching, contrary to the Standard (DevDiv#462743).  Now it results in case-insensitive matching.

 

* Constructing a std::thread resulted in a memory leak reported by _CrtDumpMemoryLeaks() (DevDiv#467504/Connect#757212).  This was not a severe, unbounded leak - what happened was that one-time initialization of an internal data structure was allocating memory marked as a "normal block" (observed by the leak tracking machinery), and it also wasn't being registered for cleanup at CRT shutdown.  We fixed this by marking this allocation as a "CRT block" (which is excluded from reporting by default) and also registering it for cleanup.

 

* Calling wait_for()/wait_until() on futures obtained from packaged_tasks was returning future_status::deferred immediately, which is never supposed to happen (DevDiv#482796/Connect#761829).  Now it returns either future_status::timeout or future_status::ready as required by the Standard.

 

* C++11's minimized allocator interface doesn't require allocators to provide a nested rebind struct, but VC didn't correctly implement this (DevDiv#483844/Connect#762094).  We've fixed this template machinery to follow the Standardese, N3690 17.6.3.5 [allocator.requirements]/3: "If Allocator is a class template instantiation of the form SomeAllocator<T, Args>, where Args is zero or more type arguments, and Allocator does not supply a rebind member template, the standard allocator_traits template uses SomeAllocator<U, Args> in place of Allocator::rebind<U>::other by default. For allocator types that are not template instantiations of the above form, no default is provided."

 

* Another minimized allocator interface bug - in debug mode, std::vector was directly using an allocator instead of going through allocator_traits, which resulted in minimal allocators not working (DevDiv#483851/Connect#762103).  We fixed this, and audited all of the containers to rule out similar problems.

 

* A bug in the Concurrency Runtime powering std::condition_variable resulted in crashes (DevDiv#485243/Connect#762560).  ConcRT fixed this.

 

* vector<bool>, humanity's eternal nemesis, crashed on x86 with indices over 2 billion (DevDiv#488351/Connect#763795).  Note that 2 billion packed bits occupy just 256 MB, so this is entirely possible.  We fixed our math so this works.

 

* pointer_traits didn't compile with user-defined void pointers (DevDiv#491103/Connect#764717).  We reworked our implementation so this compiles (instead of trying to form void& which is forbidden).

 

* Although it was correct according to the Standard, merge()'s implementation was performing more iterator comparisons than necessary (DevDiv#492840).  We changed this (and its related implementations) to be optimal.

 

* time_put's return value was correct for char but incorrect for wchar_t, because the implementations had unintentionally diverged (DevDiv#494593/Connect#766065).  Now wchar_t works like char.

 

* C++11 says that istreambuf_iterator::operator*() should return charT by value (DevDiv#494813).  Now we do that.

 

* Constructing a std::locale from a const char * returned from setlocale() could crash, because std::locale's constructor internally calls setlocale()!  (DevDiv#496153/Connect#766648)  We fixed this by always storing the constructor's argument in a std::string, which can't be invalidated like this.

 

* <random>'s independent_bits_engine and shuffle_order_engine weren't calling their _Init() helper functions in their constructors (DevDiv#502244/Connect#768195).  That was bad.  Now we're good.

 

* pow(-1.0, complex<double>(0.5)) took an incorrect shortcut, returning NaN instead of i (DevDiv#503333/Connect#768415).  We fixed the shortcut.

 

* Constructing shared_ptr from nullptr was allocating a reference count control block, which is forbidden by the Standard (DevDiv#520681/Connect#771549).  Now such shared_ptrs are truly empty.

 

* The Standard is very strict about unique_ptr::reset()'s order of operations (DevDiv#523246/Connect#771887).  N3690 20.9.1.2.5 [unique.ptr.single.modifiers]/4: "void reset(pointer p = pointer()) noexcept; Effects: assigns p to the stored pointer, and then if the old value of the stored pointer, old_p, was not equal to nullptr, calls get_deleter()(old_p). [ Note: The order of these operations is significant because the call to get_deleter() may destroy *this. —end note ]"  We now follow the Standard exactly.

 

* Interestingly, the Standard permits iostreams to be tied to themselves, but this triggered stack overflows in our implementation (DevDiv#524705/Connect#772293).  We now detect self-tying and avoid crashing.

 

* The Standard requires minmax_element() to find the last biggest element (DevDiv#532622), unlike max_element() which finds the first biggest element.  This choice is not arbitrary - it is a fundamental consequence of minmax_element()'s implementation.  We fixed minmax_element() to follow the Standard, and carefully audited it for correctness.

 

* Our implementation declared put_time() as taking tm * (DevDiv#547347/Connect#773846).  Now it takes const tm * as required by the Standard.

 

* In the Nov 2012 CTP, which was released before our Standard Library changes were ready, the compiler team shipped a "fake" <initializer_list> header.  This basically worked properly, except that it didn't have a newline at the end of the file, and that infuriated the /Za compiler option (DevDiv#547397/Connect#773888).  Now we have the "real" <initializer_list> from Dinkumware, and we've verified that there's a newline at the end.

 

* system_clock::to_time_t() attempted to perform rounding, but triggered integer overflow when given enormous inputs (DevDiv#555154/Connect#775105).  We now perform truncation, as permitted by the Standard, making us immune to integer overflow.

 

* The Standard says that forward_iterator_tag shouldn't derive from output_iterator_tag (DevDiv#557214/Connect#775231), but it did in our implementation.  We've stopped doing that, and we've changed the rest of our code to compensate.

 

* Due to an obscure compiler bug interacting with operator fake-bool(), unique_ptrs with lambda deleters weren't always boolean-testable (DevDiv#568465/Connect#775810).  Now that unique_ptr has explicit operator bool(), this bug has been completely eradicated.

 

* Between Visual C++ 2010 and 2012, we introduced a regression where parsing floating-point numbers with iostreams (e.g. "iss >> dbl") would get the last bit wrong, while the CRT's strtod() was unaffected.  We've fixed iostreams to get all the bits right (DevDiv#576315/Connect#776287, also reported as DevDiv#616647/Connect#778982 - fixed in RTM but not Preview).

 

* <random>'s mt19937 asserted that 0 isn't a valid seed (DevDiv#577418/Connect#776456).  Now it's considered valid, like the Standard says.  We fixed all of the engines to accept 0 seeds and generate correct output.

 

* <cmath>'s binary overloads weren't constrained like the unary overloads, resulting in compiler errors in obscure cases (DevDiv#577433/Connect#776471).  Although the Standard doesn't require this to work, we've constrained the binary overloads so it works.

 

* inplace_merge() is one of a few very special algorithms in the Standard - it allocates extra memory in order to do its work, but if it can't allocate extra memory, it falls back to a slower algorithm instead of failing.  Unfortunately, our implementation's fallback algorithm was incorrect (DevDiv#579795), which went unnoticed for a long time because nobody runs out of memory in the 21st century.  We've fixed inplace_merge()'s fallback algorithm and audited all fallback algorithms for correctness (including stability).  We've also added a regression test (capable of delivering a Vulcan nerve pinch to operator new()/etc. on demand) to ensure that this doesn't happen again.

 

* Due to an off-by-one error, future_errc's message() and what() didn't work (DevDiv#586551).  This was introduced when the Standard started saying that future_errc shouldn't start at 0, and we changed our implementation accordingly - but didn't notice that the message-translation was still assuming 0-based indexing.  We've fixed this.

 

* basic_regex<char>'s implementation permitted high-bit characters, either directly or in character ranges, after an old bugfix - but it rejected high-bit characters specified through regex hex escapes, either directly or in character ranges (DevDiv#604891).  Now both are permitted.

 

* The Standard requires std::functions constructed from null function pointers, null member pointers, and empty std::functions to be empty.  Our implementation was constructing non-empty std::functions storing null function pointers/etc., which would crash when invoked.  We fixed this to follow the Standard (DevDiv#617384/Connect#779047 - fixed in RTM but not Preview).

 

* pointer_traits<shared_ptr<Abstract>> didn't work due to a template metaprogramming subtlety (DevDiv#643180/Connect#781594) - the compiler really doesn't want to see abstract classes returned by value, even if we're just doing it for the purposes of decltype.  We've fixed this machinery so it works for abstract classes.

 

* In certain cases, our iterator debugging machinery was taking locks followed by no-ops, which is pointlessly slow (DevDiv#650892).  This happened in two cases: _ITERATOR_DEBUG_LEVEL=1 which is never the default, and deque.  We've fixed this so we take locks only in _ITERATOR_DEBUG_LEVEL=2 when we have actual work to protect.

 

* C++11 says that list::splice() doesn't invalidate iterators, it just transfers the affected iterators (DevDiv#671816/Connect#785388).  This is a physical guarantee, but our debug checks considered such iterators to be invalidated.  We've updated the checks so they rigorously follow C++11's rules.  (Note: forward_list::splice_after() is still affected; we plan to fix this in the future, but not in 2013 RTM.)

 

* align()'s implementation in <memory> didn't follow the Standard (DevDiv#674552).  Now it does.

 

* Lambdas capturing reference_wrappers wouldn't compile in certain situations (DevDiv#704369/Connect#788701).  Now that we've fixed conformance problems in reference_wrapper, this code compiles cleanly.

 

* std::cin no longer overheats the CPU when you hold down spacebar (xkcd#1172).

 

Breaking Changes

On that note, these features and fixes come with source breaking changes - cases where you'll have to change your code to conform to C++11, even though it compiled with Visual C++ 2012.  Here's a non-exhaustive list, all of which have been observed in actual code:

 

* You must #include <algorithm> when calling std::min() or std::max().

 

* If your code acknowledged our fake scoped enums (traditional unscoped enums wrapped in namespaces), you'll have to change it.  For example, if you were referring to the type std::future_status::future_status, now you'll have to say std::future_status.  Note that most code is unaffected - for example, std::future_status::ready still compiles.

 

* Similarly, if your code acknowledged our faux alias templates, you'll have to change it for 2013 RTM.  For example, instead of allocator_traits<A>::rebind_alloc<U>::other you'll have to say allocator_traits<A>::rebind_alloc<U>.  Interestingly, although ratio_add<R1, R2>::type will no longer be necessary and you should say ratio_add<R1, R2>, the former will continue to compile.  That's because ratio<N, D> is required to have a "type" typedef for a reduced ratio (which will be the same type if it's already reduced).

 

* explicit operator bool() is stricter than operator fake-bool().  explicit operator bool() permits both explicit conversions to bool (e.g. given shared_ptr<X> sp, both static_cast<bool>(sp) and bool b(sp) are valid) and "contextual conversions" to bool (these are the boolean-testable scenarios, e.g. if (sp), !sp, sp && whatever).  However, explicit operator bool() forbids implicit conversions to bool, so you can't say bool b = sp, and you can't say return sp; given a bool return type.

 

* Now that we're using real variadic templates, we aren't defining _VARIADIC_MAX and its unindicted co-conspirators.  We won't complain if you're still defining _VARIADIC_MAX, but it'll have no effect.  If you acknowledged our faux-variadic-template macro machinery in any other way, you'll have to change your code.

 

* In addition to ordinary keywords, STL headers now strictly forbid macroizing the context-sensitive keywords "override" and "final".

 

* reference_wrapper/ref()/cref() now strictly forbid binding to temporary objects.

 

* <random> now strictly enforces its compiletime preconditions.

 

* Various STL type traits have the precondition "T shall be a complete type".  This is now enforced more strictly by the compiler, although we do not guarantee that it is enforced in all situations.  (STL precondition violations trigger undefined behavior, so the Standard doesn't guarantee enforcement.)

 

* The STL does not attempt to support /clr:oldSyntax.

 

Frequently Asked Questions

Q1: Why are you implementing C++14 Standard Library features when you haven't finished the C++11 Core Language yet?

 

A1: That's a good question with a simple answer.  Our compiler team is well aware of the C++11 Core Language features that remain to be implemented.  What we've implemented here are C++14 Standard Library features.  Compiler devs and library devs are not interchangeable - I couldn't implement major compiler features if my life depended on it (even static_assert would take me months to figure out), and I like to think the reverse is true, although rocket scientists are probably better at pizza delivery than pizza deliverers are at rocket science.

 

Q2: Fair enough, but you mentioned "C++14 generic lambdas" earlier.  Why is your compiler team planning to implement any C++14 Core Language features before finishing all C++11 Core Language features?

 

A2: As Herb likes to say, "C++14 completes C++11".  The compiler team is pursuing full C++14 conformance, and views all C++11 and C++14 features as a unified bucket of work items.  They're prioritizing these features according to customer demand (including library demand) and implementation cost, so they can deliver the most valuable features as soon as possible.  The priority of a feature isn't affected by when it was voted into the Working Paper.  As a result, their post-2013-RTM conformance roadmap places highly valuable C++14 features (e.g. generic lambdas) before less valuable C++11 features (e.g. attributes).  Again, please see Herb's announcement video/post for more information.

 

Q3: What about the C99 Standard Library, incorporated into C++11 by reference?

 

A3: Good news - my colleague Pat Brenner worked with Dinkumware to pick up substantial chunks of C99 and integrated them into Visual C++ 2013's CRT.  We're not done yet, but we're making progress.  Unfortunately, I didn't have time to deal with the corresponding wrapper headers in the STL (<cmeow> wrapping <meow.h>).  Time was extremely limited, and I chose to spend it on getting the variadic template rewrite checked in.  I may be able to get the wrapper headers into 2013 RTM, but I cannot promise that yet.

 

Q4: Will these compiler/library features ship in Visual C++ 2012 Update N, or will we have to wait for Visual C++ 2013?

 

A4: You'll have to wait for Visual C++ 2013.  I know this isn't what most people want to hear, so please allow me to explain.

 

I'm a programmer, and if you're reading this, I assume you're a programmer too.  So, as programmers, let's look at the following diff together.  This is how <tuple> has changed from Visual C++ 2012 to 2013, as viewed through our internal diff tool "odd":

 

 

Pay special attention to the visual summary on the left.  In technical terms, this diff is a horrifying monstrosity.  <tuple>'s code is now wonderful, but the changes required to get to this point were basically a complete rewrite.  <functional> and the other headers received similar (but lesser) changes.

 

The VS Update mechanism is primarily for shipping high-priority bugfixes, not for shipping new features, especially massive rewrites with breaking changes (which are tied to equally massive compiler changes).

 

Major versions like Visual C++ 2013 give us the freedom to change and break lots of stuff.  There's simply no way we can ship this stuff in an Update.

 

Q5: What about the bugfixes?  Can we get those in an Update?

 

A5: This is an interesting question because the answer depends on my choices (whereas in the previous question, I wouldn't be allowed to ship such a rewrite in an Update even if I wanted to).

 

Each team gets to choose which bugfixes they take to "shiproom" for consideration to be included in an Update.  There are things shiproom won't let us get away with (e.g. binary breaking changes are forbidden outside of major versions), but otherwise we're given latitude to decide things.  I personally prioritize bandwidth over latency - that is, I prefer to ship a greater total number of bugfixes in every major version, instead of shipping a lesser total number of bugfixes (over the same period of time) more frequently in multiple Updates.

 

Going into more detail - backporting fixes takes a nonzero amount of time (especially as branches diverge due to accumulated changes).  It's also riskier - as you may have heard, C++ is a complicated world, and apparently simple changes can have unintended consequences.  Even if it's as small as emitting a spurious warning, we really don't want Updates to break stuff.  Fixing stuff in major versions gives us time to fix the fixes and get everything exactly right.

 

We do backport STL fixes from time to time (e.g. in Visual C++ 2010 RTM there was a std::string memory leak caused by the Small String Optimization interacting badly with move semantics - that was really bad, so we backported the 3-line fix from 2012 to 2010 SP1), but it's rare.

 

Q6: Are you targeting the C++11 International Standard, or a C++14 Working Paper?

 

A6: We're usually targeting the current Working Paper (N3690 as of right now), because of the Core/Library Issues that have been resolved since C++11.  In the STL, I consider any nonconformance to the current Working Paper to be a bug.

 

Q7: How do you pronounce "tuple"?

 

A7: It's "too-pull".  It does not rhyme with "supple"!

 

Thanks for reading this extremely long post, and I hope you enjoy using VS 2013's STL.

 

Stephan T. Lavavej

Senior Developer - Visual C++ Libraries

stl@microsoft.com

Improved exception reporting for C++ Windows Store Apps in Visual Studio 2013

$
0
0

Windows 8.1 and Visual Studio 2013 come with improvements for exception reporting in the platform and the debugger which will make it easier for native Windows Store App developers to diagnose errors in their applications. In this post, I’ll discuss a few of those improvements that are available in Visual Studio 2013 and show the differences in the debugging experience between Visual Studio 2012 and Visual Studio 2013.

The sample project that contains the code snippets in this post can be downloaded from here.

 

Captured Stacks for Windows Runtime Exceptions

Windows 8.1 adds support for capturing stack traces for exceptions as they are reported in Windows Runtime components. The debugger in Visual Studio 2013 can display those captured stack traces whenever they are available on exception objects that are derived from Platform::Exception.

To see this in action, start debugging the sample application in this post in Visual Studio 2013, put a breakpoint in the catch block below and then click on ‘Throw Handled’ button in the main application page. It will execute the event handler code below which simply catches an exception that is thrown inside the ThrowDataReaderError method.

void ExceptionSample::MainPage::ThrowDataReaderError()

{
    DataReader^ dataReader = ref new DataReader(nullptr);
    dataReader->LoadAsync(100);
}

void ExceptionSample::MainPage::btnThrowHandled_Click_1(Platform::Object^ sender, Windows::UI::Xaml::RoutedEventArgs^ e)
{
    try
    {
        ThrowDataReaderError();
    }
    catch (Platform::Exception^ e)
    {
        OutputTextBlock->Text = e->Message;
    }
}

The COM exception raised inside LoadAsync method causes the execution to break into the debugger with the exception dialog showing its details. The Visual Studio 2012 version of the dialog shows only the WinRT error information for this exception:

Visual Studio 2013 version of the dialog displays the additional stack trace information for the same exception:

The frames that read “[External Code]” indicate ‘Just My Code’ setting is enabled in the debugger so that only the user code portion of the exception stack is shown. (Just My Code is new for native debugging in Visual Studio 2013 and more details about the new feature can be found in the blog post “Just My Code for C++ in VS 2013”). There is also the new link at the bottom to “Add exception stack trace to watch” which adds the new $exceptionstack pseudovariable to the watch window. This new pseudovariable is further explained in the next section.

If you continue execution and hit the breakpoint inside the catch block, you can inspect the exception object. Visual Studio 2012 has the following details for the exception:

whereas Visual Studio 2013 has the additional stack trace information:

In addition to just being able to view the stack frames for the exception, you can navigate to source code for a frame in the exception stack by using the “Go to Source Code” option in the context menu:

 

$exceptionstack

$exceptionstack is a new pseudovariable that can be used in the debugger variable windows (Locals, Watch, QuickWatch etc.). It is available for C++ Windows Store Apps and displays the captured stack of the most recent exception on the current thread.

To see how it works, start the sample application under the debugger again and this time click “Throw Unhandled” button. The event handler code for this button calls the same ‘ThrowDataReaderError’ method as in the previous example but doesn’t handle the error raised by the method. After continuing execution on the first-chance exception, you are going to end up in the auto-generated unhandled exception event handler:

With Visual Studio 2012, there is no means for the developer to retrieve the original call stack of the exception at this point. With Visual Studio 2013, you can add the $exceptionstack variable to the watch window to see the last captured stack which allows you to understand how you ended up at that unhandled exception event handler:

Just as you can navigate to individual frames for an exception object’s captured stack, you can go to source for the frames inside the $exceptionstack variable. Also, when there is an exception stack available, the Locals window will automatically have the $exceptionstack variable added to it.

 

Task creation stacks

For those exceptions that happen inside tasks and go unobserved, there is one more piece of information that can be important to capture (especially if an exception stack trace is unavailable), which is the task creation stack.

If you run the sample application under the debugger and click on “throw inside task” button, the following code snippet is run where dataReader object attempts to read past the file (dataReader->ReadString(100)):

void ExceptionSample::MainPage::ThrowDataReaderErrorInsideTask()
{
    create_task(KnownFolders::DocumentsLibrary->CreateFileAsync(L"foo.txt",     
        CreationCollisionOption::ReplaceExisting)).then([this](StorageFile^ file)
    {
        create_task(file->OpenAsync(FileAccessMode::Read))
            .then([this](IRandomAccessStream^ readStream)

        {
            DataReader^ dataReader = ref new DataReader(readStream);
            create_task(dataReader->LoadAsync(100))
                .then([this, dataReader](unsigned int numBytesLoaded)

            {
                String^ fileContent = dataReader->ReadString(100);
                OutputTextBlock->Text = fileContent;
                delete dataReader;
            });
        });
    });
}

void ExceptionSample::MainPage::btnThrowInsideTask_Click(Platform::Object^ sender, Windows::UI::Xaml::RoutedEventArgs^ e)
{
    ThrowDataReaderErrorInsideTask();
}

When this exception goes unobserved, the debugger will stop inside the destructor for the ExceptionHolder object in ppltasks.h:

This object contains valuable information for the exception that went unobserved inside the task and can be viewed in the watch window. Visual Studio 2012 shows the instruction pointer for the top of the creation stack frame (_M_disassembleMe) and requires the user to search for the location in the disassembly window:

In Visual Studio 2013, _M_disassembleMe is replaced with _M_stackTrace which adds the full stack trace information for the task creation stack as shown below. The developer can easily navigate to source code for a frame listed in the creation stack.


In Closing

I hope you appreciate these improvements in Visual Studio 2013 and Windows 8.1 for diagnosing issues when platform exceptions get thrown. Feel free to download and play with the sample project. I look forward to your feedback in the comments below or in our MSDN Diagnostics Forum.

 

Optimizing C++ Code : Constant-Folding

$
0
0

If you have arrived in the middle of this blog series, you might want instead to begin at the beginning.

This post examines Constant-Folding – one of the simplest optimizations performed by the VC++ compiler.  In this optimization, the compiler works out the result of an expression while it is compiling (at “compile-time”), and inserts the answer directly into the generated code.  This avoids the cost of performing those same calculations when the program runs (at “runtime”).

Here is an example.  A tiny main function, stored in the file App.cpp:

int main() { return 7 + 8; }

First, some admin about this blog: 

  • We will build programs from the command-line (rather than from within Visual Studio)
  • We will use Visual Studio 2012.  In particular, the version of the compiler that generates x64 code (rather than code for the aging x86 architecture), and that compiles on an x64 computer (i.e., “x64 on x64”)

If you want to follow along, please follow the instructions found here:  essentially, you just need to choose the right variant from a list of possible “Visual Studio Tools”.

(Note that if you are using the free compiler from Visual Studio Express, it runs only on x86, but will happily generate code for x64: “x64 on x86”.  This is equally good for experiments)

We can build our sample program with the command:  CL /FA App.cpp.  The /FA switch creates an output file holding the assembly code that the compiler generates for us.  You can display it with: type App.asm to show:

 

PUBLIC  main
_TEXT   SEGMENT
main    PROC
        mov     eax, 15
        ret     0
main    ENDP
_TEXT   ENDS
END

The point of interest is the mov eax, 15 instruction – it just insert the value 15 into the EAX register (which, by definition of the x64 calling standard, is the way that an x64 function sets the int value it will return, as the function’s result, to its caller).  The compiler does not emit instructions that would add 7 to 8 at runtime.  They would have gone something like:

PUBLIC  main
_TEXT   SEGMENT
main    PROC
        mov     eax, 7
        add     eax, 8
        ret     0
main    ENDP
_TEXT   ENDS
END

(Note carefully, the last instruction, in both snippets, ret 0.  This means: return control to the caller, and pop zero bytes from the stack.  Do not be misled into thinking this means return the value 0 to the caller!)

Let me guess: you are probably thinking “this is all very well, but really!  What kind of idiot would ever write code that includes arithmetic like 7 + 8”?  Of course, you are right.  But the compiler does see such constructs, often as a side-effect of macros.  Here is an example to persuade you that Constant-Folding is a worthwhile optimization:

#define SECS_PER_MINUTE  60
#define MINUTES_PER_HOUR 60
#define HOURS_PER_DAY    24

enum Event { Started, Stopped, LostData, ParityError };

struct {
    int        clock_time;
    enum Event ev;
    char*      reason;
}   Record;

int main() {
    const int table_size = SECS_PER_MINUTE * MINUTES_PER_HOUR * HOURS_PER_DAY * sizeof Record;
    // rest of program
}

Here we are going to create a table big enough to hold a Record for each second of an entire day.  So table_size would be the size, in bytes, of that table.  It’s easy to check the assembly instruction that sets the variable table_size :

        mov     DWORD PTR table_size$[rsp], 1382400     ; 00151800H

No multiply instructions here! – it’s all calculated at compile-time: 60 * 60 * 24 * 16 = 1382400.

In fact, if we could peek inside the compiler, we would find that this level of Constant-Folding is so simple, it’s performed by the FrontEnd.  It does not require the heavy lifting power of the BackEnd Optimizer.  And therefore, it’s always on.  It makes no difference whether you request optimization (with /O2) or disable optimization (with /Od) – this optimization always cuts in.

How complicated can the expression be, and yet we still fold those constants together at compile-time?  In fact, the FrontEnd will cope with pretty much any arithmetic expression involving constants (even values such as sizeof, as above, so long as they can be evaluated at compile-time) and operators +-*/%<<>>++ and --.  You can even throw in bools, logical operators, ifs and ?: 

Are there any cases of Constant-Folding, then, that do require the power of the BackEnd Optimizer?  Yes.  Consider this example:

int bump(int n) { return n + 1; }

int main() { return 3 + bump(6); }

With the commands, cl /FA /Od App.cpp which says "no optimizations, thank you" and type App.asm, we get:

mov     ecx, 6
call    ?bump@@YAHH@Z                           ; bump
add     eax, 3

Just as we would expect: load 6 into ECX – which holds the first argument, in the x64 calling convention, to our function bump.  Then call bump.  Its result is returned in EAX.  Finally, add 3 into EAX.

Let’s see what happens if we request optimization, with: cl /FA /O2 App.cpp

mov     eax, 10

Here the BackEnd Optimizer has recognized that the bump function is so small that its body should simply be included into its caller (a sophisticated optimization called “function inlining” that we will examine later in the series).  It then realizes it can evaluated the entire expression at compile time, and so ends up with a single instruction.  Quite impressive, right?

MFC support for MBCS deprecated in Visual Studio 2013

$
0
0

Hello, I’m Pat Brenner, a developer on the Visual C++ Libraries team. In this blog post I want to share some information about the Microsoft Foundation Class (MFC) Library, and in particular the support of the multi-byte character set (MBCS) in MFC.

MFC has many features that support building desktop apps, and MFC has supported both Unicode and MBCS for many years. However, because Unicode is so popular, and because our research shows significantly reduced usage of MBCS, we are deprecating MBCS support in MFC for Visual Studio 2013. This keeps MFC more closely aligned with the Windows SDK itself, because many of the newest controls and messages are Unicode only. A warning to this effect has been added to MFC, so when an application is built using MBCS, a deprecation warning is issued. This warning can be eliminated by adding the NO_WARN_MBCS_MFC_DEPRECATION preprocessor definition to your project build definitions.

MFC is a very large library and its binary components (static and dynamic libraries and PDBs) form a large part of the total size of the Visual C++ product. The size of the MFC libraries substantially increases both download size and install time (in full install and update scenarios). In part this is because there are so many flavors of the MFC libraries: Debug/Release, Unicode/MBCS, Static/Dynamic. To address this, the MBCS libraries will only be available via a separate download, which is available here.

The goal is to remove MBCS support entirely in a subsequent release. MFC would then support only Unicode. We are interested in hearing feedback about this decision, so if you have comments, please take the time to leave a response to this article. Are you using MBCS in MFC? If so, what is the reason, and is there a reason you have not converted your application to Unicode?

We’re committed to supporting MFC and making sure that applications built with MFC will run on future Windows platforms. I hope you find this information useful and reassuring.

Pat Brenner, Visual C++ Libraries Development Team

Intercepting HTTP Request/Response using C++ Rest HTTP Library

$
0
0

We released the C++ REST SDK (codename "Casablanca") as an open source project on CodePlex in Feb 2013. It enables writing modern, asynchronous C++ code that can connect with REST services.

Using the C++ REST SDK, you can create an HTTP client that can connect to HTTP server, send requests and handle responses. The following links are pre-requisites to get familiar with the C++ Rest SDK.

The C++ Rest HTTP Library offers an interesting feature which allows you to intercept your HTTP requests and responses before sending it out on the wire. This is useful for many scenarios and allows application developers to write custom handling of messages in one common place.

In this blog post, I will describe the intercepting technique in detail and will walk through few examples/scenarios where this can be used.

"Stages" of Http client message pipeline

Intercepting HTTP requests is achieved by defining "stages" on the http client before it is sent. These stages form the full pipeline for the http client. Say if a client defines 3 new stages for the request, this would mean that the actual request will be processed by each of the stages in order before being passed on to the last stage. The last stage (which is implemented internally by the http_client) will interact with lower-level communication layers to actually send the message on the network. When creating a client instance, an application may add pipeline stages in front of the already existing stages.

A stage is defined by adding a handler using http_client.add_handler() API. This API has two overloads,

  1. Pass in a std::function object as the handler: Following will ensure lambda "foo" gets called before the message is actually sent.

    http_client.add_handler(foo)

  2. Implement a pipeline stage by deriving from the http_pipeline_stage class and pass it to the add_handler method.

The scenarios that we will discuss below will demonstrate how to use these overloads.

A handler can do 2 things:

  1. It can short-circuit the HTTP request and avoid sending the message over the network. The handler in this case should return a task representing the eventual response.
  2. Handler can do some extra processing of the message. For e.g. handler may update some counters, modify request/response headers etc. In this case, the handler should call the next stage.

One can similarly define "stages" to intercept the response received.

Scenario 1: Adding a stage to http_client request processing

Let us go through this with an example. Consider a client that uploads data to a service. However, this service has a limitation that it only accepts JSON data.

If a user sends xml or binary data in a PUT request, the service will reject this request. One way of avoiding this round-trip is by adding a step to the request processing pipeline, that checks the content-type and decides whether to continue with the request or not. In the below code-snippet, we have a content_handler lambda that does this for us:

This handler will check the content-type header and pass the request to the next stage only if the content-type is "application/json". The second input parameter to the handler is the next_stage. For any other content-types, it fails the request immediately by replying with the BadRequest (400) HTTP code.

Note: In the below snippet, we are only checking for the standard JSON MIME type. Sites can use other JSON MIME types too.

auto content_handler =

        [](http_request request, std::shared_ptr<http_pipeline_stage> next_stage) -> pplx::task<http_response>

    {

        auto content_type = request.headers().content_type();

        if (0 != content_type.compare(L”application/json))

        {

            // Short-circuit the HTTP request: construct a response object with status code = BadRequest

            // and return a task containing the response.

            http_response response;

            response.set_status_code(status_codes::BadRequest);

            return pplx::task_from_result(response);

        }

        else

        {

            // Content type is JSON, so call the next pipeline stage to send the request

            return next_stage->propagate(request);

        }

    };

 

    http_client client(L"http://localhost:60009");

    client.add_handler(content_handler);

 

    client.request(methods::PUT, L"jsonentry1", filebuf.create_istream(), L"application/json")

        .then([](http_response response)

    {

        // Print the status code.

        std::wostringstream ss;

        ss << L"Server returned returned status code "<< response.status_code() << L'.'<< std::endl;

        std::wcout << ss.str();

    });

Scenario 2: Adding multiple stages to the client:

You can always add more than one handler, they will be executed in the order in which they were added.

Say in previous example, the client wants to add its version to every outgoing request. You can define a stage that can add a custom HTTP header named "AppVersion" with the client's current version.

We will be using following overload of http_client.add_handler to achieve this.

http_client.add_handler(std::shared_ptr<http_pipeline_stage>)

The add_custom_headers stage extends http_pipeline_stage. During request processing, the http client runs this stage against the given request and passes onto the next stage. Each stage has a reference to the next stage available in the http_pipeline_stage::next_stage.
Implement the http_pipeline_stage::propagate()method to add your custom functionality. In the example below, this stageadds two new headers "AppVersion" and "ClientLocation" to the request message.

If the request was short circuited by the content_handler stage, the add_custom_headers stage will not be called. Only when the content_handler propagates the request to the next stage, will the headers be added.

    // Pipeline stage used for adding custom headers

    class add_custom_headers : public http_pipeline_stage

    {

    public:

        virtual pplx::task<http_response> propagate(http_request request)

        {

            request.headers().add(L"AppVersion", L"1.0");

            request.headers().add(L"ClientLocation", L”Redmond"));

 

            return next_stage()->propagate(request);

        }

    };

 

    http_client client(L"http://localhost:60009");

    client.add_handler(content_handler);

    std::shared_ptr<http_pipeline_stage> custom_stage = std::make_shared<add_custom_headers>();

    client.add_handler(custom_stage);

Scenario 3: Intercepting HTTP response messages

In the above two illustrations, you saw how we can intercept the request pipeline. We can do the same with responses too. The client can add a new pipeline stage that will be executed before passing the response to the application. This stage can also modify the response received from the server.

The response_count_handler below increments a counter for each response received and also adds a new header to the response. The propagate()call returns a task of the response, its continuation can modify the response and return the updated one to the application.

Note that since multiple requests can be made simultaneously, accessing any data from within the handler must use synchronization. This will impact the performance and scalability of the application, which is a concern especially when implementing the handlers at the http_listener side. Hence, it is recommended to implement purely functional handlers: they should take a message, manipulate it and pass it to the next stage. 

    volatile long response_counter = 0;

    auto response_count_handler =

        [&response_counter](http_request request, std::shared_ptr<http_pipeline_stage> next_stage) -> pplx::task<http_response>

    {

            return next_stage->propagate(request).then([&response_counter](http_response resp) -> http_response

            {

                // Use synchronization primitives to access data from within the handlers.

                ::_InterlockedIncrement(&response_counter);

                resp.headers().add(L"ResponseHeader", L"App");

                return resp;

            });

    };

 

    http_client client(L”http://localhost:60009”);

    client.add_handler(response_count_handler);

Scenario 4: Testing the HTTP client

This feature can be very useful for local testing as well. Instead of setting up a test server, you can add a stage that performs the server side validation at the client. These test hooks can reduce the test setup overhead significantly. For example: we can add a test handler that verifies a key and replies with the Forbidden (403) status code if the key is incorrect, thus performing a server side authentication step at the client itself. 

    auto test_auth_stage =

        [](http_request request, std::shared_ptr<http_pipeline_stage> next_stage) -> pplx::task<http_response>

        {

            if (is_valid_key(request.headers()))

            {

                return next_stage->propagate(request);

            }

            else

            {

                http_response response;

                response.set_status_code(status_codes::Forbidden);

                return pplx::task_from_result(response);

            }

        };

 

    http_client client(L"https://www.cpprestsite.com");

    client.add_handler(test_auth_stage);

When can this come in handy?

These handlers are good for adding extra processing at the level of HTTP messages. Some examples of when these can be used:

  1. Logging purposes or to maintain counters for the number of requests and responses
  2. To add or modify the HTTP request or response headers.
  3. Perform some validation of the request before the actual processing. For example, you can check the authentication key and reject unauthorized requests. This way, you can avoid unnecessary roundtrips.
  4. Local testing: The handlers can act as test hooks and perform server side processing locally, without actually setting up a server.

Any feedback or comments are welcome either below or at our codeplex forum.

Kavya Kotacherry,

Visual C++ QA


C++ REST SDK 1.1.0 is now available

$
0
0

C++ REST SDK 1.1.0 has been released on CodePlex.  

This version of the C++ REST SDK reintroduces the much anticipated the HTTP Listener library. Developers can now complete their REST story by creating their own server to respond to requests sent by client applications. For more information about this release and to view the sources, visit the project site at http://casablanca.codeplex.com

Project member, Kavya Kotacherry has published a blog about HTTP pipeline stages. Visit her post to learn more about some of the interesting features of the HTTP Client library.  

Did you know that the C++ REST SDK 1.0 is included in Visual Studio 2013 Preview? Download VS 2013 Preview today and use the extension manager to start building your connected applications. 

Introducing ‘Vector Calling Convention’

$
0
0

Introduction


In VS2013 (download here), we have introduced a new calling convention known as 'Vector Calling Convention' but before we go on and introduce the new calling convention let us take a look at the lay of the land today.

There are multiple calling conventions that exist today, especially on x86 platform. The x64 platform has however been slightly blessed with only one convention. The following calling conventions are supported by the Visual C/C++ compiler (_cdecl, _clrcall, _stdcall, _fastcall and others) on x86. _cdecl is the default calling convention for C/C++ programs on x86. However x64 just uses the _fastcall calling convention.

 The types __m64, __m128 and __m256 define the SIMD data types that fit into the 64-bit mmx registers, the 128-bit xmm registers, and the 256-bit ymm registers, respectively. The Microsoft compiler has partial support for these types today.

 Let us take a look at an example to understand what this (i.e. partial support) really means:  


                                                  
Figure 1: arguments are passed by reference implicitly on x64.

Today on AMD64 target, passed by value vector arguments (such as __m128/__m256/) must be turned into a passed by address of a temporary buffer (i.e. $T1, $T2, $T3 in the figure above) allocated in caller's local stack as shown in the figure above. We have been receiving increasing concerns about this inefficiency in past years, especially from game, graphic, video/audio, and codec domains. A concrete example is MS XNA library in which passing vector arguments is a common pattern in many APIs of XNAMath library. The inefficiency will be intensified on upcoming AVX2/AVX3 and future processors with wider vector registers.

On X86, the convention is a little more advanced in which first 3 passed by value vector arguments will be passed in XMM0:XMM2 register. However, 4th or beyond vector argument is not allowed and will cause C2719 error. Developers today are forced to manually turn it into passed by reference argument to get around the limitation on X86. 

 

How to make use of Vector Calling Convention?

 
The new calling convention focuses on utilizing vector registers for passing vector type arguments. With Vector Calling Convention the design consideration was to avoid creating a totally different convention and be compatible with existing convention for integer and floating point arguments. This design consideration was further extended to avoid changing the stack layout or dealing with padding and alignment. Please note, the vector calling convention is only supported for native amd64/x86 targets and further it does not apply to MSIL (/clr) target.

The new calling convention can be triggered in the following two ways: 
 

  • _vectorcall: Use the new _vectorcall keyword to control the calling convention of specific functions. For example, take a look in the figure 2 below: 

                                        Figure 2: '__vectorcall' denotes the use of Vector Calling Convention

  • The other way vector calling convention can be used is if the /Gv compiler switch is specified. Using the /Gv compiler option causes each function in the module to compile as vectorcall unless the function is declared with a conflicting attribute, or the name of the function is main.  

In addition to SIMD data types, Vector Calling Convention can also be used for Homogeneous Vector Aggregate data-type (HVA) and Homogeneous Float Aggregate data-type (HFA). An HVA/HFA data-type is a composite type where all of fundamental data types of members that compose the type are the same and are of Vector or Floating Point data type. (__m128, __m256, float/double). An HVA/HFA data type can have at most four members. Some examples of what constitutes an HVA/HFA data type are listed below.

                                        
                                                    Figure 3: HVA/HFA examples

 For both architectures (x86 and amd64), HVA/HFA arguments will be passed by value in vector registers if the unallocated volatile vector registers (XMM0:XMM5/YMM0:YMM5) is sufficient to hold the entire aggregate set.  They will be otherwise passed via reference the same way as the existing convention.  The return value of type HVA/HFA is returned via XMM0(/YMM0): XMM3(/YMM3), one register per element.

 

Vector Calling Convention /Disasm

 
Now that we understand a little about what Vector Calling Convention is really about. Let us take look at the disassembly with the use of Vector Calling Convention in the figure given below.

 

As you can see the 'Disassembly' generated as a result of using 'Vector Calling Convention' is simplified. The number of instructions with and without vector calling convention are displayed below.


In addition to the number of instructions saved, there is also the stack allocation (96 bytes, allocation of $T1, $T2 and $T3) saved by using vector calling convention which adds to the general goodness resulting in performance gains.   

 

Wrap Up

 
This blog should provide you an introduction to what Vector Calling Convention is all about. As you can observe, there is a lot of goodness in using this convention if you perform a lot of vector calculations in your code especially on the x64 platform. One quick way for validating the performance gain by using Vector Calling Convention for vector code without changing the source code is by using the /Gv compiler switch. At this point you should have everything you need to get started! Additionally, if you would like us to blog about some other compiler technology please let us know we are always interested in learning from your feedback. 

 

C99 library support in Visual Studio 2013

$
0
0

Hello, I’m Pat Brenner, a developer on the Visual C++ Libraries team. In this blog post I want to share some information about the C99 support added to the C run-time library in Visual Studio 2013.

To summarize, we added declarations and implementations for missing functions in the following headers: math.h, ctype.h, wctype.h, stdio.h, stdlib.h, and wchar.h. We also added the new headers complex.h, stdbool.h, fenv.h, and inttypes.h, and added the implementations for all the functions declared in them. In addition, we added the new C++ wrapper headers (ccomplex, cfenv, cinttypes, ctgmath) and updated a number of others (ccomplex, cctype, clocale, cmath, cstdint, cstdio, cstring, cwchar, and cwctype).

Most of this work (all the C headers except stdbool.h and fenv.h) was done in time for the Visual Studio 2013 Preview release and is available with it, but the remainder (stdbool.h, fenv.h and the C++ wrapper headers) has been done for Visual Studio 2013 RTM and will be available with that release.

In more detail, these are the declarations and implementations we added, grouped by the headers that declare them:

  • math.h: 
    • float_t, double_t, fpclassify, isfinite isinf, isnan, isnormal, signbit
    • HUGE_VALF, HUGE_VALL, INFINITY, NAN, MATH_ERRNO, MATH_ERREXCEPT
    • FP_INFINITE, FP_NAN, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, FP_ILOGB0, FP_ILOGBNAN
    • acosh, acoshf, acoshl, asinh, asinhf, asinhl, atanh, atanhf, atanhl
    • exp2, exp2f, exp2l, expm1, expm1f, expm1l
    • ilogb, ilogbf, ilogbl, logb, logbf, logbl, log1p, log1pf, log1pl, log2, log2f, log2l
    • scalbn, scalbnf, scalbnl, scalbln, scalblnf, scalblnl
    • cbrt, cbrtf, cbrtl, erf, erff, erfl, erfc, erfcf, erfcl
    • lgamma, lgammaf, lgammal, tgamma, tgammaf, tgammal
    • nearbyint, nearbyintf, nearbyintl, nan, nanf, nanl
    • rint, rintf, rintl, lrint, lrintf, lrintl, llrint, llrintf, llrintl
    • round, roundf, roundl, lround, lroundf, lroundl, llround, llroundf, llroundl
    • trunc, truncf, truncl, remainder, remainderf, remainder, remquo, remquof, remquol
    • nextafter, nextafterf, nextafterl, nexttoward, nexttowardf, nexttowardl
    • fdim, fdimf, fdiml, fmax, fmaxf, fmaxl, fmin, fminf, fminl, fma, fmaf, fmal
  • complex.h: 
    • cacos, cacosf, cacosl, casin, casinf, casinl, catan, catanf, catanl
    • ccos, ccosf, ccosl, csin, csinf, csinl, ctan, ctanf, ctanl
    • cacosh, cacoshf, cacoshl, casinh, casinhf, casinhl, catanh, catanhf, catanhl
    • ccosh, ccoshf, ccoshl, csinh, csinhf, csinhl, ctanh, ctanhf, ctanhl
    • cexp, cexpf, cexpl, clog, clogf, clogl, cabs, cabsf, cabsl
    • cpow, cpowf, cpowl, csqrt, csqrtf, csqrtl, carg, cargf, cargl
    • cimag, cimagf, cimagl, conj, conjf, conjl, cproj, cprojf, cprojl, creal, crealf, creall
  • fenv.h: 
    • fegetenv, fesetenv, feupdateenv, fegetexceptflag, fesetexceptflag
    • feclearexcept, feholdexcept, fetestexcept, feraiseexcept
  • inttypes.h:
    • PRIi8, PRIi16, PRIi32, PRIi64, PRIiMAX, PRIiPTR, PRIiLEAST8, PRIiLEAST16, PRIiLEAST32, PRIiLEAST64, PRIiFAST8, PRIiFAST16, PRIiFAST32, PRIiFAST64
    • PRIo8, PRIo16, PRIo32, PRIo64, PRIoMAX, PRIoPTR, PRIoLEAST8, PRIoLEAST16, PRIoLEAST32, PRIoLEAST64, PRIoFAST8, PRIoFAST16, PRIoFAST32, PRIoFAST64
    • PRIu8, PRIu16, PRIu32, PRIu64, PRIuMAX, PRIuPTR, PRIuLEAST8, PRIuLEAST16, PRIuLEAST32, PRIuLEAST64, PRIuFAST8, PRIuFAST16, PRIuFAST32, PRIuFAST64
    • PRIx8, PRIx16, PRIx32, PRIx64, PRIxMAX, PRIxPTR, PRIxLEAST8, PRIxLEAST16, PRIxLEAST32, PRIxLEAST64, PRIxFAST8, PRIxFAST16, PRIxFAST32, PRIxFAST64
    • PRIX8, PRIX16, PRIX32, PRIX64, PRIXMAX, PRIXPTR, PRIXLEAST8, PRIXLEAST16, PRIXLEAST32, PRIXLEAST64, PRIXFAST8, PRIXFAST16, PRIXFAST32, PRIXFAST64
    • SCNd8, SCNd16, SCNd32, SCNd64, SCNdMAX, SCNdPTR, SCNdLEAST8, SCNdLEAST16, SCNdLEAST32, SCNdLEAST64, SCNdFAST8, SCNdFAST16, SCNdFAST32, SCNdFAST64
    • SCNi8, SCNi16, SCNi32, SCNi64, SCNiMAX, SCNiPTR, SCNiLEAST8, SCNiLEAST16, SCNiLEAST32, SCNiLEAST64, SCNiFAST8, SCNiFAST16, SCNiFAST32, SCNiFAST64
    • SCNo8, SCNo16, SCNo32, SCNo64, SCNoMAX, SCNoPTR, SCNoLEAST8, SCNoLEAST16, SCNoLEAST32, SCNoLEAST64, SCNoFAST8, SCNoFAST16, SCNoFAST32, SCNoFAST64
    • SCNu8, SCNu16, SCNu32, SCNu64, SCNuMAX, SCNuPTR, SCNuLEAST8, SCNuLEAST16, SCNuLEAST32, SCNuLEAST64, SCNuFAST8, SCNuFAST16, SCNuFAST32, SCNuFAST64
    • SCNx8, SCNx16, SCNx32, SCNx64, SCNxMAX, SCNxPTR, SCNxLEAST8, SCNxLEAST16, SCNxLEAST32, SCNxLEAST64, SCNxFAST8, SCNxFAST16, SCNxFAST32, SCNxFAST64
    • SCNX8, SCNX16, SCNX32, SCNX64, SCNXMAX, SCNXPTR, SCNXLEAST8, SCNXLEAST16, SCNXLEAST32, SCNXLEAST64, SCNXFAST8, SCNXFAST16, SCNXFAST32, SCNXFAST64
    • imaxabs, imaxdiv, strtoimax, strtoumax, wcstoimax, wcstoumax
  • ctype.h
    • isblank
  • wctype.h
    • iswblank
  • float.h
    • DECIMAL_DIG, FLT_EVAL_METHOD
  • stdarg.h
    • va_copy
  • stdbool.h
    • bool, true, false, __bool_true_false_are_defined
  • stdio.h
    • vscanf, vfscanf, vsscanf
  • stdlib.h
    • atoll, strtof, strtold, strtoll, strtoull
  • wchar.h
    • vwscanf, vfwscanf, vswscanf, wcstof, wcstold, wcstoll, wcstoull

We know that this is not complete support for the C99 library functions. To the best of our understanding, the missing pieces are these:

  • The tgmath.h header is missing. C compiler support is needed for this header.
    • Note that the ctgmath header was added—this is possible because that header does not require the tgmath.h header—only the ccomplex and cmath headers.
  • The uchar.h header is missing. This is from the C Unicode TR.
  • Several format specifiers in the printf family are not yet supported.
  • The snprintf and snwprintf functions are missing from stdio.h and wchar.h.

I hope you find this information useful. We did all that we had time for, while trying to prioritize those functions we thought most important.

Pat Brenner, Visual C++ Libraries Development Team

New Features in C++/CX's for VS 2013 RTM

$
0
0

Introduction:

Hi, I’m Brandon Jacobs, an intern on the Visual C++ Libraries team. For part of my internship, I was tasked with adding new features to Stephan T. Lavavej’s <collection.h>. It was certainly an honor to be one of the few to contribute to <collection.h>. You can find these changes in VS 2013 RTM (these changes are not in 2013 Preview).

 

Summary:

These are the changes I’ve made to <collection.h>:

1.            Added UnorderedMap and UnorderedMapView wrapper classes to <collection.h>.

2.            Added initializer_list constructors to Vector, VectorView, Map, MapView,  UnorderedMap, and UnorderedMapView.

3.            Added validation so that customers use valid WinRT types to instantiate the collections in <collection.h>.

 

Features:

UnorderedMap and UnorderedMapView:

This is the wrapper class for the std::unordered_map class. The functionality is the same as Map and MapView. The only difference is the underlying data structure, which is a hash table instead of a balanced binary tree. So, types must be hashable and must be able to show equality. Just like Map and MapView defaults to std::less,  UnorderedMap and UnorderedMapView defaults to std::hash and std::equal_to. Both the hash and equality predicates are template parameters, so you are allowed to change the actions of UnorderedMap and UnorderedMapView by providing your own predicates.

 

initializer_list constructors:

You can now construct any data structure using the C++11 initializer_list constructors.

An example:

namespace WFC = Windows::Foundation::Collections;

namespace PC = Platform::Collections;

WFC::IVector<int>^ v = ref new PC::Vector<int>{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

WFC::IMap<int, int>^ m

= ref new PC::Map<int, int>{ { 1, 10 }, { 2, 20 }, { 3, 30 }, { 4, 40 } };

WFC::IMap<int, int>^ m2

= ref new PC::UnorderedMap<int, int>{ { 1, 10 }, { 2, 20 }, { 3, 30 }, { 4, 40 } };

 

Valid WinRT types only:

We now check whether the type you want to store in your given data structure is a valid WinRT type. Currently if you have:

PC::Vector<ordinary_cpp_class>

You will get an odd compiler error. Now we check whether the types passed in are valid WinRT types. If this check fails, you will now get a much nicer compiler error that even includes the line in which you tried to create a collection with an invalid type.

Only the items that will appear in the Windows::Foundation::Collections interfaces need to be valid WinRT types. Predicates such as std::less, std::hash, etc. are not passed into the Windows::Foundation::Collections interfaces, so they are not affected by that restriction.

Valid WinRT types are:

  1. integers
  2. interface class ^
  3. public ref class^
  4. value struct
  5. public enum class

 

Thank you for taking the time and reading this post,

Brandon Jacobs

SDE Intern – Visual C++ Libraries

C++ IDE Performance Improvement in Visual Studio 2013 Preview

$
0
0

My name is Li Shao. I am a Senior Software Design Engineer in Test on the VC++ team. In this blog, I would like to share the performance enhancements we've made in VS 2013 Preview to improve the C++ IDE and build system.

Performance is a vital part of software quality. Over the last couple of releases we have made significant performance improvements in areas such as solution load, IntelliSense and code navigation. In VS 2013, we focused our performance effort on a few other key scenarios such as configuration switching, responsiveness and build throughput. We also improved the way we monitor real-world performance, something we refer to as Performance Telemetry.

 

Faster Configuration Switching

Based on feedback from C++ developers, we focused our performance tuning effort on configuration switching. We used a real-world application with 800 projects as a benchmark solution for driving performance gains. One of the major contributors to configuration switching time we discovered had to do with the complexity of the file structure in the solution navigator. We had some inefficient ways of re-computing file expansions state and filters – by focusing on this code plus a few additional bugs along the way, we drove some very significant performance improvements to configuration switching times. Below is the data before and after the performance improvement for this real-world application:

 

 Figure 1: Performance improvement in C++ configuration switch operation

  For this solution:

  • First time switch configuration time improved by roughly 75% compared to VS 2012
  • Subsequent switch configuration time improved from one and half minutes to less than half second

In addition to performance tuning, we have also added telemetry so that we can find additional opportunities to improve performance across all solutions. Below is a snap shot of the telemetry data for switching configuration:

Figure 2: Telemetry data on configuration switching

X-axis is the time to perform configuration switch operation in milliseconds. Y-axis on the left is the number occurrence in each bucket. Y-axis on the right is the cumulative percentage in each bucket.

From the telemetry data, we can see that the median elapsed time for configuration switching is now less than 100ms. There are less than 10% of the instances taking more than 6s. If you still feel that Visual Studio 2013 configuration switching performance is not up to your expectation, please help us by providing a repro and profile using Visual Studio 2013 Preview Feedback Tools.

 

Better responsiveness

One major infrastructure investment we have made to improve performance is to move the C++ project system into a multi-threaded model. This provides agility for most code in the project system to run on any thread, allowing for concurrent execution and better responsiveness. This work also reduced the sources of hangs. For example, in VS 2012, an individual project was loaded in about 3 chunks, and each happened on the UI which included I/O. In VS 2013, with the multi-threading work, we are able to do much of the work (including I/O) off the main thread completely. The parts that remain on the UI thread are executed in smaller chunks so responsiveness is a lot more guaranteed than before. My colleague Andrew Arnott will discuss this work in more detail in future blog post.

For the editing experience, we now have telemetry infrastructure in place to help us monitor real-world typing responsiveness. Figure 3 shows the Telemetry data for typing responsiveness in VS 2013. You may remember Eric Knox's blog post on VS 2012 typing responsiveness. Compared to VS 2012, VS 2013 is on par with VS 2012 with 98.8% of C++ keystrokes processed in under 50ms (Figure 3.A). In the >50ms range, we are also seeing more data points shifting from the 100-250ms (VS 2012 0.48%, VS 2013 0.30%) bucket to 50-100ms (VS 2012 0.37%, VS 2013 0.57%) bucket, compared to VS 2012 (Figure 3.B). The VS 2013 Preview C++ typing median responsiveness is 15ms (Figure 3.C).

 

                         (A)                                                                                            (B)

                                   (C)

Figure 3: VS 2013 C++ editor typing responsiveness telemetry data

In addition to typing responsiveness, we've also fixed a variety of performance bugs, including an issue that could result in an O(n2) algorithm when saving files that are checked into source code control. This can greatly improve the experience when dealing with large applications.

 

Faster build throughput

For the VS 2013 Preview release, we have made various fixes to improve compiler throughput. Based on our internal measurement, we have improved the compiler throughput in scenarios where you are not using the multi-processor compilation flag (the non /MP case) by 10% (Figure 4.A).

For those of you who are using the MP compilation switch, we have also reduced the compiler overhead and improved the overall build throughput on machines with many cores. Figure 4.B shows the throughout improvement when building a large internal application on a 48-core machine. In this example, the overall build time was almost cut in half in VS 2013, compared to VS 2012.

 

                                                                   (A)

   (B)

Figure 4: C++ build throughput improvement

 

Faster incremental build for Windows Store apps with WinMD references

In VS 2012, for a Windows Store app which has references to one or more WinMD projects (a project that generates WinMDs as output), a change to the source file of the WinMD projects will trigger a full build of the app. In VS 2013, we have made a fix so that the linker timestamp of the WinMD does not change when there is no content change to the WinMD file. This speeds up overall build when making non-content changes to the referenced C++ WinMD. Figure 5 shows an example improvement with one of our internal Windows applications. The same change in the WinMD project would take 15 minutes to build and cost almost the same time as building the entire solution in VS 2012. In VS 2013, it only takes a minute to do the incremental build.

 

 Figure 5: C++ windows store application incremental build throughput improvement

Note that you may not see the full performance improvement if you have managed WinMD projects as part of the reference chain. The reason is that unlike C++ WinMDs, managed WinMDs contain both type signature and implementation, which means any change will cause the WinMD timestamp change. We'd love to hear from you in the comments section below or on our UserVoice site if this scenario is important to you.

 

More extensive performance telemetry

As I mentioned earlier, we have made performance improvements in the IntelliSense and code navigation (browsing) features last release. To help us to continue to improve on the performance of these scenarios, we now collect telemetry for Intellisense/browsing operations including Member list population, Auto-completion, Quick info, Parameter help, Go to definition and a few other related operations. Figure 6 has the examples of the trend data that are now available to us. For example, in the trend data below, we can see that the median elapsed time for auto-completion, member list and go to definition are 105ms, 175ms and 226ms respectively. The majority (90%) of the auto-completion and member list operations are now completed within 200ms. We'll be using this data to prioritize where we invest to improve performance over time; you can help by opt-ing into our Customer Experience Improvement Program (see this blog post for more details)

 

                                                                                (A)

        (B)

   (C)

 

Figure 6: Telemetry data for Intellisense and code navigation

Above are the important performance improvement and infrastructure work that we have done for VS 2013 Preview to improve the IDE and build throughput experience. We appreciate your continued feedback. If you have not already done so, please download the VS2013 Preview release, and provide us with feedback in the comments section below, through UserVoice, using the VS 2013 preview feedback tool to help us prioritize the work we do to improve the experience when working with Visual Studio.

 

 

 

Optimizing C++ Code : Dead Code Elimination

$
0
0

 

If you have arrived in the middle of this blog series, you might want instead to begin at the beginning.

This post examines the optimization called Dead-Code-Elimination, which I’ll abbreviate to DCE.  It does what it says: discards any calculations whose results are not actually used by the program.

Now, you will probably assert that your code calculates only results that are used, and never any results that are not used: only an idiot, after all, would gratuitously add useless code – calculating the first 1000 digits of pi, for example, whilst also doing something useful.  So when would the DCE optimization ever have an effect?

The reason for describing DCE so early in this series is that it could otherwise wreak havoc and confusion throughout our  exploration of other, more interesting optimizations: consider this tiny example program, held in a file called Sum.cpp:

int main() {
    long long s = 0;
    for (long long i = 1; i <= 1000000000; ++i) s += i;
}

We are interested in how fast the loop executes, calculating the sum of the first billion integers.  (Yes, this is a spectacularly silly example, since the result has a closed formula, taught in High school.  But that’s not the point)

Build this program with the command:  CL /Od /FA Sum.cpp  and run with the command Sum.  Note that this build disables optimizations, via the /Od switch.  On my PC, it takes about 4 seconds to run.  Now try compiling optimized-for-speed, using CL /O2 /FA Sum.cpp.  On my PC, this version runs so fast there’s no perceptible delay.  Has the compiler really done such a fantastic job at optimizing our code?  The answer is no (but in a surprising way, also yes):

Let’s look first at the code it generates for the /Od case, stored into Sum.asm.  I have trimmed and annotated the text to show only the loop body:

       mov    QWORD PTR s$[rsp], 0                     ;; long long s = 0
       mov    QWORD PTR i$1[rsp], 1                    ;; long long i = 1
       jmp    SHORT $LN3@main  
$LN2@main:

       mov    rax, QWORD PTR i$1[rsp]                  ;; rax = i
       inc    rax                                      ;; rax += 1
       mov    QWORD PTR i$1[rsp], rax                  ;; i = rax 
$LN3@main:
       cmp    QWORD PTR i$1[rsp], 1000000000           ;; i <= 1000000000 ?
       jg     SHORT $LN1@main                          ;; no – we’re done
       mov    rax, QWORD PTR i$1[rsp]                  ;; rax = i
       mov    rcx, QWORD PTR s$[rsp]                   ;; rcx = s
       add    rcx, rax                                 ;; rcx += rax
       mov    rax, rcx                                 ;; rax = rcx
       mov    QWORD PTR s$[rsp], rax                   ;; s = rax
       jmp    SHORT $LN2@main                          ;; loop
$LN1@main:

The instructions look pretty much like you would expect.  The variable i is held on the stack at offset i$1 from the location pointed-to by the RSP register; elsewhere in the asm file, we find that i$1 = 0.  We use the RAX register to increment i.  Similarly, variable s is held on the stack (at offset s$ from the location pointed-to by the RSP register; elsewhere in the asm file, we find that s$ = 8).  The code uses RCX to calculate the running sum each time around the loop.

Notice how we load the value for i from its “home” location on the stack, every time around the loop; and write the new value back to its “home” location.  The same for the variable s.  We could describe this code as “naïve” – it’s been generated by a dumb compiler (i.e., one with optimizations disabled).  For example, it’s wasteful to keep accessing memory for every iteration of the loop, when we could have kept the values for i and s in registers throughout.

So much for the non-optimized code.  What about the code generated for the optimized case?  Let’s look at the corresponding Sum.asm for the optimized, /O2, build.  Again, I have trimmed the file down to just the part that implements the loop body, and the answer is:

                                                       ;; there’s nothing here!

Yes – it’s empty!  There are no instructions that calculate the value of s.  

Well, that answer is clearly wrong, you might say.  But how do we know the answer is wrong?  The optimizer has reasoned that the program does not actually make use of the answer for s at any point; and so it has not bothered to include any code to calculate it.  You can’t say the answer is wrong, unless you check that answer, right?

We have just fallen victim to, been mugged in the street by, and lost our shirt to, the DCE optimization.  If you don’t observe an answer, the program (often) won’t calculate that answer. 

You might view this effect in the optimizer as analogous, in a profoundly shallow way, to a fundamental result in Quantum Physics, often paraphrased in articles on popular science as “If a tree falls in the forest, and there’s nobody around to hear, does it make a sound?”

Well, let’s “observe” the answer in our program, by adding a printf of variable s, into our code, as follows:

#include <stdio.h>
int main() {
    long long s = 0;
    for (long long i = 1; i <= 1000000000; ++i) s += i;
    printf("%lld ", s);

The /Od version of this program prints the correct answer, still taking about 4 seconds to run.  The /O2 version prints the same, correct answer, but runs much faster.   (See the optional section below for the value of “much faster” – in fact, the speedup is around 7X)

At this point, we have explained the main point of this blog post: always be very in analyzing compiler optimizations, and in measuring their benefit, that we are not being misled by DCE.  Here are four steps to help notice, and ward off, the unasked-for attention of DCE:

  • Check that the timings have not suddenly improved by an order of magnitude
  • Examine the generated code (using the /FA switch)
  • If in doubt add a strategic printf
  • Put the code of interest into its own .CPP file, separate from the one holding main.  This works, so long as you do not request whole-program optimization (via the /GL switch, that we’ll cover later)

However, we can wring several more interesting lessons from this tiny example.  I have marked these sections below as “Optional-1” through “Optional-4”.

  

Optional-1 : Codegen for /O2

Why is our /O2 version (including a printf to defeat DCE), so much faster than the /Od version?  Here is the code generated for the loop of the /O2 version, extracted from the Sum.asm file:

       xor    edx, edx
       mov    eax, 1 
       mov    ecx, edx
       mov    r8d, edx
       mov    r9d, edx
       npad   13
$LL3@main:
       inc    r9
       add    r8, 2
       add    rcx, 3
       add    r9, rax                           ;; r9  = 2  8 18 32 50 ...
       add    r8, rax                           ;; r8  = 3 10 21 36 55 ...
       add    rcx, rax                          ;; rcx = 4 12 24 40 60 ...
       add    rdx, rax                          ;; rdx = 1  6 15 28 45 ...
       add    rax, 4                            ;; rax = 1  5  9 13 17 ...
       cmp    rax, 1000000000                   ;; i <= 1000000000 ?
       jle    SHORT $LL3@main                   ;; yes, so loop back

Note that the loop body contains approximately the same number of instructions as the non-optimized build, and yet it runs much faster.  That’s mostly because the instructions in this optimized loop use registers, rather than memory locations.  As we all know, accessing a register is much faster than accessing memory.  Here are approximate latencies that demonstrate how memory access can slow your program to a snail’s pace:

Location

Latency

Register1 cycle
L14 cycles
L210 cycles
L375 cycles
DRAM60 ns

So, the non-optimized version is reading and writing to stack locations, which will quickly migrate into L2 (10 cycle access time) and L1 cache (4 cycle access time).  Both are slower than if all the calculation is done in registers, with access times around a single cycle.

But there’s more going on here to make the code run faster.  Notice how the /Od version increments the loop counter by 1 each time around the loop.  But the /O2 version increments the loop counter (held in register RAX) by 4 each time around the loop.  Eh?

The optimizer has unrolled the loop.  So it adds four items together on each iteration, like this:

s = (1 + 2 + 3 + 4) + (5 + 6 + 7 + 8) + (9 + 10 + 11 + 12) + (13 + . . .

By unrolling this loop, the test for loop-end is made every four iterations, rather than on every iteration – so the CPU spends more time doing useful work than forever checking whether it can stop yet!

Also, rather than accumulate the results into a single location, it has decided to use 4 separate registers, to accumulate 4 partial sums, like this:

RDX = 1 + 5 +  9 + 13 + ...  =  1,  6, 15, 28 ...
R9  = 2 + 6 + 10 + 14 + ...  =  2,  8, 18, 32 ...
R8  = 3 + 7 + 11 + 15 + ...  =  3, 10, 21, 36 ...
RCX = 4 + 8 + 12 + 16 + ...  =  4, 12, 24, 40 ...

At the end of the loop, it adds the partial sums, in these four registers, together, to get the final answer.

(I will leave it as an exercise for the reader how the optimizer copes with a loop whose trip count is not a multiple of 4)

 

Optional-2 : Accurate Performance Measurements

Earlier, I said that the /O2 version of the program, without a printf inhibitor, “runs so fast there’s no perceptible delay”.  Here is a program that makes that statement more precise:

#include <stdio.h>
#include <windows.h>
int main() {
  LARGE_INTEGER start, stop;
  QueryPerformanceCounter(&start);
    long long s = 0;
    for (long long i = 1; i <= 1000000000; ++i) s += i;
  QueryPerformanceCounter(&stop);
  double diff = stop.QuadPart - start.QuadPart;
  printf("%f", diff);
}

It uses QueryPerformanceCounter to measure the elapsed time.  (This is a “sawn-off” version of a high-resolution timer described in a previous blog).  There are lots of caveats to bear in-mind when measuring performance (you might want to browse a list I wrote previously), but they don’t matter for this particular case, as we shall see in a moment:

On my PC, the /Od version of this program prints a value for  diff of about 7 million somethings.  (The units of the answer don’t matter – just know that the number gets bigger as the program takes longer to run).  The /O2 version prints a value for diff of 0.  And the cause, as explained above, is our friend DCE.

When we add a printf to prevent DCE, the /Od version runs in about 1 million somethings - a speedup of about 7X.

 

Optional-3 : x64 Assembler “widening”

If we look carefully again at the assembler listings in this post, we find a few surprises in the part of the program that initializes our registers:

       xor    edx, edx                          ;; rdx = 0     (64-bit!)
       mov    eax, 1                            ;; rax = i = 1 (64-bit!)
       mov    ecx, edx                          ;; rcx = 0     (64-bit!)
       mov    r8d, edx                          ;; r8  = 0     (64-bit!)
       mov    r9d, edx                          ;; r9  = 0     (64-bit!)
       npad   13                                ;; multi-byte nop alignment padding
$LL3@main:

Recall first that the original C++ program uses long long variables for both the loop counter and the sum.  In the VC++ compiler, these map onto 64-bit integers.  So we should expect the generated code to make use of x64’s 64-bit registers.

We already explained, in a previous post, that xor reg, reg is a compact way to zero out the contents of reg.  But our first instruction is apply xor to register EDX – the lower 32 bits of the RDX register.  The next instruction moves a 1 into EAX, the lower 32 bits of the RAX register.  This same pattern – of moving a value into 32-bit register continues with the next 3 instructions.  Taken at face value, this would leave the higher 32 bits of each target register containing random junk.  And yet the loop body performs calculations on the full-width, 64-bit registers.  How can the answers possibly be right?

The answer is that the x64 instruction set, originally published by AMD, chose to automatically zero-extend the high 32 bits of a 64-bit destination register.  Here are the two applicable bullets from section 3.4.5 of that manual:

  • Zero-Extension of 32-Bit Results: 32-bit results are zero-extended into the high 32 bits of 64-bit GPR destination registers.
  • No Extension of 8-Bit and 16-Bit Results: 8-bit and 16-bit results leave the high 56 or 48 bits, respectively, of 64-bit GPR destination registers unchanged.

Finally, notice the npad 13 instruction (really a pseudo-operation – a directive to the assembler).  This ensures the next instruction – which is the start of the loop body – will be aligned in memory on an address that’s a multiple of 16 bytes.  This improves performance (sometimes, and on some micro-architectures).

  

Optional-4 : printf versus std::cout

You might ask why I used C’s printf function, rather than C++ std::cout as a way to defeat DCE in the above experiment.  Try it and see.  Both work ok, but the asm file generated by the latter is much bigger, and therefore more difficult to navigate around: 0.7 Mbytes compared with 1.7 Kbytes.

 

 

 

 

 

 

 

ATL and MFC changes and fixes in Visual Studio 2013

$
0
0

Hello, I’m Pat Brenner, a developer on the Visual C++ Libraries team.  In this blog post I would like to share with you the changes that we’ve made in ATL and MFC for Visual Studio 2013.

One of the major changes we made was to eliminate the ATL DLL altogether.  All ATL code is now static, either in the header files or in the ATL static library.  We also reduced the amount of code in the ATL static library substantially, so there are no longer multiple static libraries for debug/release mode or Unicode/ANSI character set.  There is only one ATL static library that is common to all configurations. 

The changes to ATL also included the elimination of the ATL/MFC Trace Tool and the simplification of the tracing mechanism.  The TRACE macros now essentially boil down to OutputDebugString, and there is no external controller of tracing level (like the trace tool provided)—the tracing level is set in the application itself.  This does cause source-breaking changes in some uses of the ATL::CTraceCategory class, which will require changes in source code when migrating to Visual Studio 2013.

The major change we made in MFC was the deprecation of MBCS support (see more information in this separate blog post).

In addition, we fixed about 105 bugs in MFC and about 60 bugs in ATL.  About one-fourth of these bugs (in both libraries) were reported by customers.

Though I cannot provide a complete list of the bugs in our internal bug database, here is a list of the bugs that were reported by customers through our Connect site that have been fixed in ATL and MFC for Visual Studio 2013 RTM. Click on any Connect bug number to see more information about that bug.  Note that most of these bugs were fixed for the Preview release as well.

Connect #

Bug Title

710163

atlbase.h disables no longer existing C4217 warning

714790

Useless line of code in AtlSafeRealloc()

714791

AtlSafeRealloc() treats failures inconsistently and this leads to memory leaks

714840

CAtlServiceModuleT::LogEventEx() contains a useless check

714802

Suspicious error handling code in CAtlExeModuleT::StartMonitor()

742895

CComApartment::Apartment() leaks objects on edge cases

736213

ATL::CComSafeArray::operator[] ambiguity

764800

wrong/missing sal annotations on consumer oledb macros

750369

Breaking change in how the ATL OLE DB CCommand::Execute method behaves

774871

Certification fails for Windows Store App with ATL-based library

785396

Uninstalling VS2012 Update 2 and repair of VS results in ATL files missing.

789669

ATL CRBMap::Lookup code analysis markup issue

790309

VC++11 regression: error C2338: db_command using v110 toolset

745790

Static MFC executables produced by Visual Studio 2012 RC are huge

750838

MFC loads DLLs using LoadLibraryEx with flag only supported on Windows8

757588

CMFCRibbonBar::AddToTabs removes a wrong button from the m_arButtons array

763947

EndDialog in OnInitDialog reopen Dialog

763517

IMPLEMENT_DYNAMIC produces compile error for statically linked MFC projects

768378

CMFCTabCtrl bug

769093

MFC Edit Browse box not showing browse button.

772859

Calling EndDialog() within OnInitDialog() causes the dialog to be displayed twice.

750859

Visual Studio 11 Beta - bug running .exe in XP service pack 3

763474

Errors detected in the Visual C++ 2012 libraries

760371

LocalFree called twice in CDatabase (MFC 11)

710858

MFC OLE-Server doesn't seem to support the new style MFC  Toolbars

773463

Attempting to use DrawStatusText after including afxwin.h  results in link error

768257

Probems with CRecordset::GetFieldValue(short nIndex,  CDBVariant& varValue) in VS2012

772549

x64 MFC Macro Bug - ON_WM_POWERBROADCAST() /   CWnd::OnPowerBroadcast

773448

CHttpFile::QueryInfo() returns "corrupted" CStrings with invalid lengths.

778201

Missing MFC Functions

777604

CWnd::GetScrollLimit returns 1 if scrolling is deactivated

781179

CMFCPopupMenu crash when you click outside while submenu still open

781379

CMFCShellTreeCtrl fails to handle some UNC pathnames correctly

781257

MFC - CMFCTabCtrl - when style is STYLE_3D_VS2005 and SetActiveTabBoldFont() is set

789970

Unpaired pragma warning push/pop in afxwin.h in Release build

790246

MFC: bad hard typecast in CMFCToolBarMenuButton::CompareWith

790975

HTTP_QUERY_FLAG_REQUEST_HEADERS on CHttpFile::QueryInfo()   asserts wrongly

792003

CMFCShellListCtrl::OnContextMenu 'Delete' context menu handler does not work

 

I hope you find this information useful.  Please let us know if you have any questions.

Pat Brenner, Visual C++ Libraries Development Team


C++ IDE Improvements in Visual Studio 2013

$
0
0

When we considered what features to add to the C++ IDE in Visual Studio 2013, we decided to focus on improving the C++ code editing experience. We've added a number of features that will help you write and format your code more quickly, and will give you more useful information in IntelliSense. You can configure the behavior of most of these new features using the Options dialog from the Tools menu.

Code Formatting

One of my biggest frustrations with the C++ editor in Visual Studio over the past several years has been the lack of formatting when I pasted code into the editor. Combined with the lack of many common code formatting settings, this meant I spent quite a bit of time making my code look how I wanted, rather than actually writing new code.

In Visual Studio 2013 we added over 40 settings to help you control when and how your C/C++ code is formatted. We realize that there are many common formatting conventions for C/C++ code, so rather than dictate a single style of code formatting, we aim to give you the flexibility to adapt the settings to match your existing coding standards.

In some cases, we know that none of the options provided might be what you want. One particular example of this is controlling the exact position of opening curly braces for various block types. For these settings, we added an option to not apply a particular formatting rule so that you can control a particular aspect of your code formatting without needing to turn off formatting globally.

It's also possible to take advantage of code formatting without applying it automatically – simply go to the Text Editor -> C/C++ -> Formatting -> General page in the Options dialog, and uncheck all the boxes to turn off the auto-formatting behavior. You can then manually format your code using the Format Document and Format Selection commands from the Edit -> Advanced menu. This can be helpful if different areas of your codebase use different formatting standards, or if you need to make changes to 3rd party library code in your solution.

IntelliSense Improvements

We made some changes to Member List and Parameter Help so they provide more relevant information.

The Member List window now hides private members of types, except when you're editing code that defines the type.

The Parameter Help tooltip that appears when typing parameters of an overloaded function will now automatically switch to the best matching overload based on the number of parameters you've typed thus far. And it properly handles nested function calls – when you start typing a nested function call, Parameter Help will display results relevant to the nested call, then restore the contents of the tooltip to the outer function call when you close the argument list.

Small Changes that make a Big Difference

Have you ever used the "Go To Header File" command from the editor context menu? Now you can toggle back and forth between a header and its corresponding code file. There's even a default keyboard shortcut – Ctrl + K, Ctrl + O.

The editor can auto-generate event handlers in C++/CX and C++/CLI code files. You can choose to auto-generate just the delegate instance, or both the delegate instance and event handler function (both definition and declaration).

You may remember that In Visual Studio 2010, we changed Find All References to show just textual matches by default. You could have opted to have the compiler verify whether the textual results were actual matches.

We've changed Find All References to automatically resolve matches incrementally in the background after the textual matches are displayed – you don't have to resolve the references in order to get the information from the compiler.

We also added a toolbar to the Find Symbol Results window so it's easier to navigate the results and to stop resolution if desired.

And the Project Properties window is (finally) resizable.

Other Features

C++ developers can also take advantage of the improvements made to the common Visual Studio code editor, such as Peek Definition, Brace Completion, Enhanced Scrollbar, and updated Navigate To. You can learn more about these features in the Visual Studio blog post "Visual Studio 2013 New Editor Features".

Your Feedback Matters

One of the main reasons we ship preview releases of Visual Studio is to get your feedback and bug reports on new features and functionality, so we can fix the main issues before the RTM release. We regularly review bugs and other feedback reported via Connect and through the Visual Studio Send a Smile feature. As of this writing, the Visual Studio 2013 Preview has been available for about two months. Since that time we've fixed 14 bugs that you've reported (and several others we've found internally). Please keep on sending your feedback and reporting bugs. Even though we may not be able to fix every issue or respond to each piece of feedback, please know that we do read and consider all feedback we receive.

Wrap-up

For the complete list of Visual C++ IDE features in Visual Studio 2013, check out the What's New for Visual C++ in Visual Studio 2013 Preview page. And if there are features you'd like to see us add to the C++ IDE in future versions of Visual Studio, please vote for them on our UserVoice site.

How is your experience developing graphics apps?

$
0
0

Hello, my name is Rong Lu. I’m a PM on Visual C++ team working on graphics development features in VS, including asset designers, templates, graphics diagnostics, etc.. In preparation for planning for graphics tooling capabilities in the next version of Visual Studio, we’re trying to understand the needs of graphics developers better. We’d love to hear how your experience is developing graphics apps using VS today, and what you like to see in the future. This can help us focus on the right things to meet the needs of graphics developers like you.

We’d appreciate if you can take 15 min to take this survey we prepared. If you know of other graphics developers or forums where graphics developers hang out, please forward on.

http://aka.ms/graphicsdevsurvey

Thank you in advance for sharing your thoughts with the team! Your help is appreciated and does make an impact.

Ability to debug Optimized Code (Optimized Debugging)

$
0
0

Optimized Debugging or in other words the ability to debug optimized code has come across as an important feature request in previous surveys.

As we start planning for the next version of Visual Studio we would like to better understand the experience that Visual Studio provides today.

In addition to this we would also like to gather requirements on what additional debugging information is required to improve this experience (for eg., debugging information for locals). 

For this purpose we would really appreciate if you can take 5-10 mins out to take this survey we have prepared :)

Link to survey for Optimized Debugging. 

 

 


C++ REST SDK 1.2.0 is now available

$
0
0

C++ REST SDK 1.2.0 has been released on CodePlex. This version of the C++ REST SDK includes the following: 

  • Support for HTTPS for the Linux version of the library
  • Client side features for Windows XP 
  • Performance improvements, especially JSON parsing
  • Ability for side-by-side installation with v1.1.0
  • As always, bug fixes

For more information about this release and to view the sources, visit the project site at http://casablanca.codeplex.com

Feedback on Your C++ Development Activities

$
0
0

Hello. My name is Gabriel Ha and I am a program manager on the Visual C++ team.

Do you have 10-20 minutes to take a survey on C++ developer activities?

We want to get a better idea of what C++ developers frequently spend their time doing when they develop their C++ code. We will use your feedback to make improvements to Visual Studio to aide you in the tasks you find yourself doing all the time (and we want your feedback even if you DON’T primarily use Visual Studio)!

You’ll also have the opportunity in the survey to comment on any pain points you have when you code (in Visual Studio or otherwise), and we’ll definitely try to address oft-cited pain points in the next release!

Ready? Head on over to http://aka.ms/cppactivities and take a few minutes to help us out.

We appreciate your time very much!

Best regards,
Gabriel Ha
gaha@microsoft.com

Viewing all 437 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>